第十二版
TWELFTH EDITION
纽约州哈德逊街 330 号 NY 10013
330 Hudson Street, NY NY 10013
课件组合管理高级副总裁:Marcia Horton
组合管理总监:工程、计算机科学和全球版:Julian
Partridge高等教育组合管理专家:Matt Goldstein
组合管理助理:Meghan Jacoby
内容管理制作人:Scott Disanno
内容制作人:Carole Snyder
网站开发人员:Steve Wright
权利和许可经理:Ben Ferrini
制造买家、高等教育、Lake Side Communications Inc (LSC):Maura Zaldivar-Garcia
库存经理:Ann Lam
产品营销经理:Yvonne Vannatta
现场营销经理:Demetrius Hall
营销助理:Jon Bryant
封面设计师:Joyce Wells,jWellsDesign
全方位服务项目经理:Prathiba Rajagopal,SPi Global
构成:SPi Global
Senior Vice President, Courseware Portfolio Management: Marcia Horton
Director, Portfolio Management: Engineering, Computer Science & Global Editions: Julian Partridge
Specialist, Higher Ed Portfolio Management: Matt Goldstein
Portfolio Management Assistant: Meghan Jacoby
Managing Content Producer: Scott Disanno
Content Producer: Carole Snyder
Web Developer: Steve Wright
Rights and Permissions Manager: Ben Ferrini
Manufacturing Buyer, Higher Ed, Lake Side Communications Inc (LSC): Maura Zaldivar-Garcia
Inventory Manager: Ann Lam
Product Marketing Manager: Yvonne Vannatta
Field Marketing Manager: Demetrius Hall
Marketing Assistant: Jon Bryant
Cover Designer: Joyce Wells, jWellsDesign
Full-Service Project Manager: Prathiba Rajagopal, SPi Global
Composition: SPi Global
版权所有 © 2019、2016、2013、2010 Pearson Education, Inc.保留所有权利。美国制造。本出版物受版权保护,在禁止复制、存储在检索系统中或以任何形式或任何手段(电子、机械、影印、录制或类似方式)传输之前,应先获得出版商的许可。有关许可、申请表和 Pearson Education 全球权利与许可部门内相关联系人的信息,请访问http:
Copyright © 2019, 2016, 2013, 2010 Pearson Education, Inc. All rights reserved. Manufactured in the United States of America. This publication is protected by copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions department, please visit http:/
制造商和销售商为区分其产品而使用的许多名称均已声明为商标。本书中出现这些名称时,如果出版商知道商标声明,则这些名称将以首字母大写或全部大写形式印刷。
Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or all caps.
本书的作者和出版商已尽最大努力编写本书。这些努力包括开发、研究和测试理论和程序以确定其有效性。作者和出版商对这些程序或本书中包含的文档不作任何明示或暗示的保证。作者和出版商在任何情况下均不对因提供、执行或使用这些程序而导致的或由此产生的偶然或间接损害承担责任。
The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential damages with, or arising out of, the furnishing, performance, or use of these programs.
美国国会图书馆出版品目錄數據
Library of Congress Cataloging-in-Publication Data
标题:编程语言概念 / Robert W. Sebesta,科罗拉多大学科罗拉多斯普林斯分校。
说明:第十二版。| Pearson,[2019]
标识符:LCCN 2017059077| ISBN 9780134997186(alk. paper)| ISBN 0134997182(alk. paper)
主题:LCSH:编程语言(电子计算机)分类:LCC QA76.7 .S43 2019 | DDC 005.13 --dc23 LC 记录可在https://lccn.loc.gov/ 2017059077
上找到
Title: Concepts of programming languages / Robert W. Sebesta, University of Colorado at Colorado Springs.
Description: Twelfth edition. | Pearson, [2019]
Identifiers: LCCN 2017059077| ISBN 9780134997186 (alk. paper) | ISBN 0134997182 (alk. paper)
Subjects: LCSH: Programming languages (Electronic computers)
Classification: LCC QA76.7 .S43 2019 | DDC 005.13--dc23 LC record available at https:/
1 18
1 18
ISBN 10:0-13-499718-2
ISBN 13:978-0-13-499718-6
ISBN 10: 0-13-499718-2
ISBN 13: 978-0-13-499718-6
Chapter 2: Added Section 2.16.4 A Replacement for Objective-C: Swift
Added Section 2.16.5 Another Related Language: Delphi
Deleted Section 2.18.6 Origins and Characteristics of Lua
第5章:重写了第 5.5.3节 中的几个段落以进行更正和澄清
Chapter 5: Rewrote several paragraphs in Section 5.5.3 to correct and clarify
Chapter 6: Added a paragraph to Section 6.3.2 to describe support for strings in Swift
Added a paragraph to Section 6.4.2 to describe support the enumeration types in Swift
Added a paragraph to Section 6.5.3 to describe support for arrays in Swift
Added a paragraph to Section 6.6.1 to describe support for associative arrays in Swift
Deleted the interview in Section 6.6.1
Added Section 6.12 Optional Types
第8章:在第8.3.1.1节中添加了一个设计问题并对其进行了简要讨论
Chapter 8: Added a design issue and a brief discussion of it to Section 8.3.1.1
Added several paragraphs to Section 8.3.4 that describe iterators in Python
Chapter 9: Added a paragraph to Section 9.5.4 on Swift parameters
Chapter 11: Deleted Section 11.4.2 (Abstract Data Types in Objective-C)
Chapter 12: Deleted Section 12.4.5 (Objective-C)
Deleted Objective-C column from Table 12.1
Added a paragraph in the Summary on reflection
《编程语言概念》第十二版的目标、总体结构和方法与前十一版保持一致。主要目标是介绍当代编程语言的基本结构,并为读者提供批判性评估现有和未来编程语言所需的工具。次要目标是通过深入讨论编程语言结构、介绍描述语法的形式化方法以及介绍词汇和语法分析方法,为读者做好编译器设计研究的准备。
The goals, overall structure, and approach of this twelfth edition of Concepts of Programming Languages remains the same as those of the eleven previous editions. The principal goals are to introduce the fundamental constructs of contemporary programming languages and to provide the reader with the tools necessary for the critical evaluation of existing and future programming languages. A secondary goal is to prepare the reader for the study of compiler design, by providing an in-depth discussion of programming language structures, presenting a formal method of describing syntax, and introducing approaches to lexical and syntax analysis.
第十二版从第十一版演变而来,经历了多种类型的变化。为了保持内容的时效性,几乎所有关于某些编程语言(特别是 Lua 和 Objective-C)的讨论都被删除了。在几章中增加了关于较新语言 Swift 的内容。
The twelfth edition evolved from the eleventh through several different kinds of changes. To maintain the currency of the material, nearly all discussion of some programming languages, specifically Lua and Objective-C, has been removed. Material on the newer language, Swift, was added to several chapters.
此外,第6章新增了关于可选类型的一节。 第 8.3.4节 添加了内容以描述 Python 中的迭代器。在手稿的许多地方都进行了细微修改以纠正和/或澄清讨论。
In addition, a new section on optional types was added to Chapter 6. Material was added to Section 8.3.4 to describe iterators in Python. In numerous places in the manuscript small changes were made to correct and/or clarify the discussion.
本书通过讨论各种语言结构的设计问题、研究一些最常见语言中这些结构的设计选择以及批判性地比较设计方案来描述编程语言的基本概念。
This book describes the fundamental concepts of programming languages by discussing the design issues of the various language constructs, examining the design choices for these constructs in some of the most common languages, and critically comparing design alternatives.
任何严肃的编程语言研究都需要研究一些相关主题,其中包括描述编程语言语法和语义的形式化方法,这些方法将在第3章 中介绍。此外,还必须考虑各种语言结构的实现技术:词汇和语法分析将在第4章 中讨论,子程序链接的实现将在第10章 中介绍。其他一些语言结构的实现将在本书的其他部分讨论。
Any serious study of programming languages requires an examination of some related topics, among which are formal methods of describing the syntax and semantics of programming languages, which are covered in Chapter 3. Also, implementation techniques for various language constructs must be considered: Lexical and syntax analysis are discussed in Chapter 4, and implementation of subprogram linkage is covered in Chapter 10. Implementation of some other language constructs is discussed in various other parts of the book.
以下段落概述了第十二版的内容。
The following paragraphs outline the contents of the twelfth edition.
第1章首先介绍学习编程语言的基本原理。然后讨论用于评估编程语言和语言结构的标准。还探讨了影响语言设计的主要因素、常见的设计权衡以及实现的基本方法。
Chapter 1 begins with a rationale for studying programming languages. It then discusses the criteria used for evaluating programming languages and language constructs. The primary influences on language design, common design trade-offs, and the basic approaches to implementation are also examined.
第 2章概述了本书所讨论语言的演变。尽管本书并未试图完整地描述任何一种语言,但讨论了每种语言的起源、目的和贡献。这一历史概述很有价值,因为它提供了理解当代语言设计实践和理论基础所需的背景知识。它还激发了对语言设计和评估的进一步研究。由于本书的其余部分都不依赖于第2章 ,因此可以独立于其他章节单独阅读。
Chapter 2 outlines the evolution of the languages that are discussed in this book. Although no attempt is made to describe any language completely, the origins, purposes, and contributions of each are discussed. This historical overview is valuable, because it provides the background necessary to understanding the practical and theoretical basis for contemporary language design. It also motivates further study of language design and evaluation. Because none of the remainder of the book depends on Chapter 2, it can be read on its own, independent of the other chapters.
第3章介绍了描述编程语言语法的主要形式化方法 — BNF。接下来介绍了属性语法,它描述了语言的语法和静态语义。然后探讨了语义描述的艰巨任务,并简要介绍了三种最常见的方法:操作语义、指称语义和公理语义。
Chapter 3 describes the primary formal method for describing the syntax of programming language—BNF. This is followed by a description of attribute grammars, which describe both the syntax and static semantics of languages. The difficult task of semantic description is then explored, including brief introductions to the three most common methods: operational, denotational, and axiomatic semantics.
第 4章介绍词法和语法分析。本章针对的是那些不再要求在课程中开设编译器设计课程的计算机科学系。与第2章类似,本章是独立的,可以独立于本书的其余部分进行学习,但第 3章 除外,因为它依赖于本书。
Chapter 4 introduces lexical and syntax analysis. This chapter is targeted to those Computer Science departments that no longer require a compiler design course in their curricula. Similar to Chapter 2, this chapter stands alone and can be studied independently of the rest of the book, except for Chapter 3, on which it depends.
第5章至第 14 章详细描述了编程语言主要结构的设计问题。在每种情况下,都介绍并评估了几种示例语言的设计选择。具体来说,第 5章介绍了变量的多种特性,第 6章介绍了数据类型,第 7章解释了表达式和赋值语句。第 8章描述了控制语句,第 9章和第 10章 讨论了子程序及其实现。第11章介绍了数据抽象功能。第12章深入讨论了支持面向对象编程的语言特性(继承和动态方法绑定),第13章讨论了并发程序单元,第14章介绍了异常处理,并简要讨论了事件处理。
Chapters 5 through 14 describe in detail the design issues for the primary constructs of programming languages. In each case, the design choices for several example languages are presented and evaluated. Specifically, Chapter 5 covers the many characteristics of variables, Chapter 6 covers data types, and Chapter 7 explains expressions and assignment statements. Chapter 8 describes control statements, and Chapters 9 and 10 discuss subprograms and their implementation. Chapter 11 examines data abstraction facilities. Chapter 12 provides an in-depth discussion of language features that support object-oriented programming (inheritance and dynamic method binding), Chapter 13 discusses concurrent program units, and Chapter 14 is about exception handling, along with a brief discussion of event handling.
第 15章和第 16 章描述了两种最重要的替代编程范式:函数式编程和逻辑编程。但是,函数式编程语言的一些数据结构和控制结构将在第 6 章和第 8 章中讨论。第15 章介绍了Scheme , 包括对其一些原始函数的描述、特殊形式、函数形式以及一些用 Scheme 编写的简单函数示例。简要介绍了 ML、Haskell 和 F#,以说明函数式语言设计的一些不同方向。第16章 介绍逻辑编程和逻辑编程语言 Prolog。
Chapters 15 and 16 describe two of the most important alternative programming paradigms: functional programming and logic programming. However, some of the data structures and control constructs of functional programming languages are discussed in Chapters 6 and 8. Chapter 15 presents an introduction to Scheme, including descriptions of some of its primitive functions, special forms, and functional forms, as well as some examples of simple functions written in Scheme. Brief introductions to ML, Haskell, and F# are given to illustrate some different directions in functional language design. Chapter 16 introduces logic programming and the logic programming language, Prolog.
第1章和第3章通常内容详尽,尽管学生觉得这两章有趣且有益,但第 2章由于缺乏硬技术内容,因此讲课时间很少。如前所述,由于后续章节中没有任何内容依赖于第2章 ,因此可以完全跳过。如果需要编译器设计课程,则第4章 不涉及。
Chapters 1 and 3 are typically covered in detail, and though students find it interesting and beneficial reading, Chapter 2 receives little lecture time due to its lack of hard technical content. Because no material in subsequent chapters depends on Chapter 2, as noted earlier, it can be skipped entirely. If a course in compiler design is required, Chapter 4 is not covered.
对于具有丰富 C++、Java 或 C# 编程经验的学生来说,第5章至第 9章应该相对容易。第10章 至第 14章更具挑战性,需要更详细的讲解。
Chapters 5 through 9 should be relatively easy for students with extensive programming experience in C++, Java, or C#. Chapters 10 through 14 are more challenging and require more detailed lectures.
第 15章和第 16章对于大多数初级学生来说都是全新的。理想情况下,应该为需要学习这些章节内容的学生提供 Scheme 和 Prolog 语言处理器。其中包含足够的材料,让学生可以尝试一些简单的程序。
Chapters 15 and 16 are entirely new to most students at the junior level. Ideally, language processors for Scheme and Prolog should be available for students required to learn the material in these chapters. Sufficient material is included to allow students to dabble with some simple programs.
本科课程可能无法涵盖最后两章的所有内容。然而,研究生课程应该能够通过跳过命令式语言前几章的部分内容来完整讨论这些章节的内容。
Undergraduate courses will probably not be able to cover all of the material in the last two chapters. Graduate courses, however, should be able to completely discuss the material in those chapters by skipping over some parts of the early chapters on imperative languages.
本书的所有读者都可以在www.pearson.com/cs-resources上找到以下补充材料。
The following supplements are available to all readers of this book at www.pearson.com/cs-resources.
一套讲义幻灯片。本书每章均配有 PowerPoint 幻灯片。
A set of lecture note slides. PowerPoint slides are available for each chapter in the book.
书中的所有数字。
All of the figures from the book.
本书的配套网站是www.pearson.com/cs-resources。该网站包含几种语言的迷你手册(约 100 页的教程)。
A companion Web site to the book is available at www.pearson.com/cs-resources. This site contains mini-manuals (approximately 100-page tutorials) on a handful of languages.
许多习题集的答案都可以在我们的教师资源中心(www.pearson.com )上找到,供合格的教师参考。请联系您所在学校的 Pearson 代表或访问www.pearson.com进行注册。
Solutions to many of the problem sets are available to qualified instructors in our Instructor Resource Center at www.pearson.com. Please contact your school’s Pearson representative or visit www.pearson.com to register.
本书中讨论的一些编程语言的处理器和相关信息可在以下网站找到:
Processors for and information about some of the programming languages discussed in this book can be found at the following Web sites:
几乎所有浏览器都包含 JavaScript;几乎所有 Web 服务器都包含 PHP。
JavaScript is included in virtually all browsers; PHP is included in virtually all Web servers.
所有这些信息也包含在配套网站上。
All this information is also included on the companion Web site.
杰出审稿人的建议对本书目前的形式和内容贡献巨大。按字母顺序排列,他们为:
The suggestions from outstanding reviewers contributed greatly to this book’s present form and contents. In alphabetical order, they are:
在《编程语言概念》的开发各个阶段,还有许多其他人为之前的版本提供了意见。他们的所有意见都非常有用,我非常感谢他们。按字母顺序排列,他们分别是:Vicki Allan、Henry Bauer、Carter Bays、Manuel E. Bermudez、Peter Brouwer、Margaret Burnett、Paosheng Chang、Liang Cheng、John Crenshaw、Charles Dana、Barbara Ann Griem、Mary Lou Haag、John V. Harrison、Eileen Head、Ralph C. Hilzer、Eric Joanis、Leon Jololian、Hikyoo Koh、Jiang B. Liu、Meiliu Lu、Jon Mauney、Robert McCoard、Dennis L. Mumaugh、Michael G. Murphy、Andrew Oldroyd、Young Park、Rebecca Parsons、Steve J. Phelps、Jeffery Popyack、Steven Rapkin、Hamilton Richard、Tom Sager、Raghvinder Sangwan、Joseph Schell、Sibylle Schupp、Mary Louise Soffa、Neelam Soundarajan、Ryan Stansifer、Steve Stevenson、Virginia Teller、Yang Wang、John M. Weiss、Franck夏(Xia)和萨利赫·尤纳斯(Salih Yurnas)。
Numerous other people provided input for the previous editions of Concepts of Programming Languages at various stages of its development. All of their comments were useful and greatly appreciated. In alphabetical order, they are: Vicki Allan, Henry Bauer, Carter Bays, Manuel E. Bermudez, Peter Brouwer, Margaret Burnett, Paosheng Chang, Liang Cheng, John Crenshaw, Charles Dana, Barbara Ann Griem, Mary Lou Haag, John V. Harrison, Eileen Head, Ralph C. Hilzer, Eric Joanis, Leon Jololian, Hikyoo Koh, Jiang B. Liu, Meiliu Lu, Jon Mauney, Robert McCoard, Dennis L. Mumaugh, Michael G. Murphy, Andrew Oldroyd, Young Park, Rebecca Parsons, Steve J. Phelps, Jeffery Popyack, Steven Rapkin, Hamilton Richard, Tom Sager, Raghvinder Sangwan, Joseph Schell, Sibylle Schupp, Mary Louise Soffa, Neelam Soundarajan, Ryan Stansifer, Steve Stevenson, Virginia Teller, Yang Wang, John M. Weiss, Franck Xia, and Salih Yurnas.
投资组合管理专家 Matt Goldstein、投资组合管理助理 Meghan Jacoby、内容制作人 Scott Disanno 和 Prathiba Rajagopal 都值得我感谢,他们努力快速、仔细地制作出第十二版。
Matt Goldstein, Portfolio Management Specialist; Meghan Jacoby, Portfolio Management Assistant; Managing Content Producer, Scott Disanno; and Prathiba Rajagopal, all deserve my gratitude for their efforts to produce the twelfth edition both quickly and carefully.
Robert Sebesta 是科罗拉多大学科罗拉多斯普林斯分校计算机科学系名誉副教授。Sebesta 教授拥有科罗拉多大学博尔德分校应用数学学士学位以及宾夕法尼亚州立大学计算机科学硕士和博士学位。他教授计算机科学已有 40 多年。他的专业兴趣是编程语言和 Web 编程的设计和评估。
Robert Sebesta is an Associate Professor Emeritus in the Computer Science Department at the University of Colorado–Colorado Springs. Professor Sebesta received a BS in applied mathematics from the University of Colorado in Boulder and MS and PhD degrees in computer science from Pennsylvania State University. He has taught computer science for more than 40 years. His professional interests are the design and evaluation of programming languages and Web programming.
在开始讨论编程语言的概念之前,我们必须考虑一些准备工作。首先,我们解释一下为什么计算机科学专业的学生和专业软件开发人员应该学习语言设计和评估的一般方法。对于那些认为掌握一两种编程语言就足以胜任计算机科学家工作的人来说,这一讨论尤其有价值。然后,我们简要介绍主要的编程领域。接下来,由于本书评估了语言结构和特性,我们列出了一系列标准,可以作为此类判断的基础。然后,我们讨论影响语言设计的两个主要因素:机器架构和程序设计方法。之后,我们介绍编程语言的主要类别。接下来,我们描述在语言设计过程中必须考虑的一些最重要的权衡。
Before we begin discussing the concepts of programming languages, we must consider a few preliminaries. First, we explain some reasons why computer science students and professional software developers should study general approaches to language design and evaluation. This discussion is especially valuable for those who believe that a working knowledge of one or two programming languages is sufficient for computer scientists. Then, we briefly describe the major programming domains. Next, because the book evaluates language constructs and features, we present a list of criteria that can serve as a basis for such judgments. Then, we discuss the two major influences on language design: machine architecture and program design methodologies. After that, we introduce the major categories of programming languages. Next, we describe a few of the most important trade-offs that must be considered during language design.
由于本书还涉及编程语言的实现,因此本章概述了最常见的通用实现方法。最后,我们简要描述了几个编程环境示例,并讨论了它们对软件生产的影响。
Because this book is also about the implementation of programming languages, this chapter includes an overview of the most common general approaches to implementation. Finally, we briefly describe a few examples of programming environments and discuss their impact on software production.
学生自然会想知道学习编程语言概念对他们有什么好处。毕竟,计算机科学中的许多其他主题都值得认真研究。事实上,现在许多人认为,计算机领域还有更多值得研究的重要领域,而这些领域在四年制大学课程中是无法涵盖的。以下是我们认为学习编程语言概念的潜在好处的有力列表:
It is natural for students to wonder how they will benefit from the study of programming language concepts. After all, many other topics in computer science are worthy of serious study. In fact, many now believe that there are more important areas of computing for study than can be covered in a four-year college curriculum. The following is what we believe to be a compelling list of potential benefits of studying concepts of programming languages:
表达思想的能力增强。人们普遍认为,人们思考的深度受到他们用来交流思想的语言的表达能力的影响。对自然语言理解能力较弱的人,其思维复杂性有限,尤其是在抽象深度方面。换句话说,人们很难将无法用口头或书面描述的结构概念化。
程序员在开发软件的过程中也受到类似的限制。他们开发软件的语言限制了他们可以使用的控制结构、数据结构和抽象的种类;因此,他们可以构建的算法形式同样受到限制。了解更广泛的编程语言特性可以减少软件开发中的这种限制。程序员可以通过学习新的语言结构来扩大其软件开发思维过程的范围。
有人可能会说,学习其他语言的功能对被迫使用缺乏这些功能的语言的程序员没有帮助。然而,这种说法站不住脚,因为语言结构通常可以在不直接支持这些结构的其他语言中模拟。例如,一位 C 语言程序员(Harbison 和 Steele,2002)如果学习了 Perl 中关联数组的结构和用法(Christianson 等人,2013),可能会设计模拟该语言中关联数组的结构。换句话说,学习编程语言概念可以培养对有价值的语言特性和结构的欣赏,并鼓励程序员使用它们,即使他们使用的语言不直接支持这些特性和结构。
Increased capacity to express ideas. It is widely believed that the depth at which people can think is influenced by the expressive power of the language in which they communicate their thoughts. Those with only a weak understanding of natural language are limited in the complexity of their thoughts, particularly in depth of abstraction. In other words, it is difficult for people to conceptualize structures they cannot describe, verbally or in writing.
Programmers, in the process of developing software, are similarly constrained. The language in which they develop software places limits on the kinds of control structures, data structures, and abstractions they can use; thus, the forms of algorithms they can construct are likewise limited. Awareness of a wider variety of programming language features can reduce such limitations in software development. Programmers can increase the range of their software development thought processes by learning new language constructs.
It might be argued that learning the capabilities of other languages does not help a programmer who is forced to use a language that lacks those capabilities. That argument does not hold up, however, because often, language constructs can be simulated in other languages that do not support those constructs directly. For example, a C (Harbison and Steele, 2002) programmer who had learned the structure and uses of associative arrays in Perl (Christianson et al., 2013) might design structures that simulate associative arrays in that language. In other words, the study of programming language concepts builds an appreciation for valuable language features and constructs and encourages programmers to use them, even when the language they are using does not directly support such features and constructs.
改进选择合适语言的背景知识。一些专业程序员几乎没有接受过正规的计算机科学教育;相反,他们独立或通过内部培训计划发展了编程技能。此类培训计划通常将教学限制在与组织当前项目直接相关的一两种语言上。其他程序员多年前就接受了正规培训。他们当时学习的语言不再被广泛使用,而现在编程语言中的许多功能在当时并不为人所知。结果是,许多程序员在为新项目选择语言时,会使用他们最熟悉的语言,即使这种语言不太适合手头的项目。如果这些程序员熟悉更广泛的语言和语言结构,他们将能够更好地选择具有最能解决问题功能的语言。
一种语言的某些特性通常可以在另一种语言中模拟。但是,最好使用已经集成到语言中的特性,而不是使用该特性的模拟,因为后者通常不太优雅、更麻烦、更不安全。
Improved background for choosing appropriate languages. Some professional programmers have had little formal education in computer science; rather, they have developed their programming skills independently or through in-house training programs. Such training programs often limit instruction to one or two languages that are directly relevant to the current projects of the organization. Other programmers received their formal training years ago. The languages they learned then are no longer widely used, and many features now available in programming languages were not commonly known at the time. The result is that many programmers, when given a choice of languages for a new project, use the language with which they are most familiar, even if it is poorly suited for the project at hand. If these programmers were familiar with a wider range of languages and language constructs, they would be better able to choose the language with the features that best address the problem.
Some of the features of one language often can be simulated in another language. However, it is preferable to use a feature whose design has been integrated into a language than to use a simulation of that feature, which is often less elegant, more cumbersome, and less safe.
提高学习新语言的能力。计算机编程仍然是一门相对年轻的学科,设计方法、软件开发工具和编程语言仍处于不断发展的状态。这使得软件开发成为一项令人兴奋的职业,但这也意味着持续学习至关重要。学习一门新的编程语言的过程可能漫长而艰难,特别是对于只熟悉一两种语言、从未研究过编程语言概念的人来说。一旦彻底理解了语言的基本概念,就会更容易看出这些概念是如何融入所学语言的设计中的。例如,理解面向对象编程概念的程序员比从未使用过这些概念的程序员更容易学习 Ruby( Thomas 等人,2013 年)。
同样的现象也发生在自然语言中。你对母语的语法了解得越多,学习第二语言就越容易。此外,学习第二语言的好处是可以让你更多地了解你的第一语言。
TIOBE 编程社区发布了一个指数(),该指数是编程语言相对流行度的指标。例如,根据该指数,Java、C、C++(Lippman 等人,2012 年)和http:/C#(Albahari 和 Abrahari,2012)是 2017 年 2 月使用最广泛的四种语言。1然而,当时还有数十种其他语言被广泛使用。指数数据还显示,编程语言的使用分布总是在变化。使用中的语言数量和统计数据的动态性质意味着每个软件开发人员都必须做好学习不同语言的准备。
最后,对于在职程序员来说,了解编程语言的词汇和基本概念至关重要,这样他们才能阅读和理解编程语言的描述和评估,以及语言和编译器的宣传资料。这些都是选择和学习语言所需的信息来源。
Increased ability to learn new languages. Computer programming is still a relatively young discipline, and design methodologies, software development tools, and programming languages are still in a state of continuous evolution. This makes software development an exciting profession, but it also means that continuous learning is essential. The process of learning a new programming language can be lengthy and difficult, especially for someone who is comfortable with only one or two languages and has never examined programming language concepts in general. Once a thorough understanding of the fundamental concepts of languages is acquired, it becomes far easier to see how these concepts are incorporated into the design of the language being learned. For example, programmers who understand the concepts of object-oriented programming will have a much easier time learning Ruby (Thomas et al., 2013) than those who have never used those concepts.
The same phenomenon occurs in natural languages. The better you know the grammar of your native language, the easier it is to learn a second language. Furthermore, learning a second language has the benefit of teaching you more about your first language.
The TIOBE Programming Community issues an index (http:/) that is an indicator of the relative popularity of programming languages. For example, according to the index, Java, C, C++ (Lippman et al., 2012), and C# (Albahari and Abrahari, 2012) were the four most popular languages in use in February 2017.1 However, dozens of other languages were widely used at the time. The index data also show that the distribution of usage of programming languages is always changing. The number of languages in use and the dynamic nature of the statistics imply that every software developer must be prepared to learn different languages.
Finally, it is essential that practicing programmers know the vocabulary and fundamental concepts of programming languages so they can read and understand programming language descriptions and evaluations, as well as promotional literature for languages and compilers. These are the sources of information needed in order to choose and learn a language.
更好地理解实现的重要性。在学习编程语言的概念时,接触影响这些概念的实现问题既有趣又有必要。在某些情况下,理解实现问题有助于理解语言为何如此设计。反过来,这些知识有助于我们更智能地使用语言,因为它就是为使用而设计的。通过理解编程语言构造中的选择及其后果,我们可以成为更好的程序员。
只有了解一些相关实现细节的程序员才能发现和修复某些类型的程序错误。了解实现问题的另一个好处是,它使我们能够直观地了解计算机如何执行各种语言结构。在某些情况下,对实现问题的一些了解可以提示可能为程序选择的替代结构的相对效率。例如,对子程序调用实现的复杂性知之甚少的程序员通常没有意识到,经常调用的小子程序可能是一种非常低效的设计选择。
由于本书仅涉及少数实现问题,因此前两段内容也可以作为研究编译器设计的基本原理。
Better understanding of the significance of implementation. In learning the concepts of programming languages, it is both interesting and necessary to touch on the implementation issues that affect those concepts. In some cases, an understanding of implementation issues leads to an understanding of why languages are designed the way they are. In turn, this knowledge leads to the ability to use a language more intelligently, as it was designed to be used. We can become better programmers by understanding the choices among programming language constructs and the consequences of those choices.
Certain kinds of program bugs can be found and fixed only by a programmer who knows some related implementation details. Another benefit of understanding implementation issues is that it allows us to visualize how a computer executes various language constructs. In some cases, some knowledge of implementation issues provides hints about the relative efficiency of alternative constructs that may be chosen for a program. For example, programmers who know little about the complexity of the implementation of subprogram calls often do not realize that a small subprogram that is frequently called can be a highly inefficient design choice.
Because this book touches on only a few of the issues of implementation, the previous two paragraphs also serve well as rationale for studying compiler design.
更好地利用已知的语言。大多数当代编程语言都很庞大且复杂。因此,程序员很少会熟悉并使用其所用语言的所有功能。通过研究编程语言的概念,程序员可以了解他们已经使用的语言中以前未知和未使用的部分,并开始使用这些功能。
Better use of languages that are already known. Most contemporary programming languages are large and complex. Accordingly, it is uncommon for a programmer to be familiar with and use all of the features of a language he or she uses. By studying the concepts of programming languages, programmers can learn about previously unknown and unused parts of the languages they already use and begin to use those features.
计算的整体进步。最后,有一种计算的全局观,可以证明对编程语言概念的研究是合理的。虽然通常可以确定某种编程语言流行的原因,但许多人至少在回想起来时认为,最流行的编程语言是语言并不总是最好的。在某些情况下,我们可能会得出这样的结论:一种语言之所以被广泛使用,至少部分是因为那些负责选择语言的人对编程语言概念不够熟悉。
例如,许多人认为,如果 ALGOL 60(Backus 等人,1963 年)在 20 世纪 60 年代初取代了 Fortran(ISO/IEC 1539-1,2010),情况会更好,原因有很多。但事实并非如此,部分原因是当时的程序员和软件开发经理并不清楚地了解 ALGOL 60 的概念设计。他们发现它的描述难以阅读(事实确实如此),甚至更难理解。他们没有意识到块结构、递归和结构良好的控制语句的好处,因此他们没有看到 ALGOL 60 相对于 Fortran 的优势。
当然,还有许多其他因素导致 ALGOL 60 未被广泛接受,我们将在第2章 中看到这一点。然而,计算机用户普遍没有意识到该语言的好处,这一事实起着重要作用。
一般来说,如果选择语言的人见识广博的话,也许更好的语言最终会挤掉较差的语言。
Overall advancement of computing. Finally, there is a global view of computing that can justify the study of programming language concepts. Although it is usually possible to determine why a particular programming language became popular, many believe, at least in retrospect, that the most popular languages are not always the best available. In some cases, it might be concluded that a language became widely used, at least in part, because those in positions to choose languages were not sufficiently familiar with programming language concepts.
For example, many people believe it would have been better if ALGOL 60 (Backus et al., 1963) had displaced Fortran (ISO/IEC 1539-1, 2010) in the early 1960s, because it was more elegant and had much better control statements, among other reasons. That it did not, is due partly to the programmers and software development managers of that time, many of whom did not clearly understand the conceptual design of ALGOL 60. They found its description difficult to read (which it was) and even more difficult to understand. They did not appreciate the benefits of block structure, recursion, and well-structured control statements, so they failed to see the benefits of ALGOL 60 over Fortran.
Of course, many other factors contributed to the lack of acceptance of ALGOL 60, as we will see in Chapter 2. However, the fact that computer users were generally unaware of the benefits of the language played a significant role.
In general, if those who choose languages were well informed, perhaps better languages would eventually squeeze out poorer ones.
计算机已应用于无数不同的领域,从控制核电站到在手机中提供视频游戏。由于计算机用途的多样性,开发了具有不同目标的编程语言。在本节中,我们简要讨论一些最常见的计算机应用领域及其相关语言。
Computers have been applied to a myriad of different areas, from controlling nuclear power plants to providing video games in mobile phones. Because of this great diversity in computer use, programming languages with very different goals have been developed. In this section, we briefly discuss a few of the most common areas of computer applications and their associated languages.
第一台数字计算机出现于 20 世纪 40 年代末和 50 年代初,被发明并用于科学应用。当时的科学应用通常使用相对简单的数据结构,但需要大量的浮点算术计算。最常见的数据结构是数组和矩阵;最常见的控制结构是计数循环和选择。早期为科学应用发明的高级编程语言旨在满足这些需求。它们的竞争对手是汇编语言,因此效率是主要考虑因素。第一门科学应用语言是 Fortran。ALGOL 60 及其大多数后代也旨在用于这一领域,尽管它们也被设计用于相关领域。对于一些以效率为主要考虑因素的科学应用,例如 20 世纪 50 年代和 60 年代常见的应用,没有一种后续语言比 Fortran 更好,这也解释了为什么 Fortran 仍在使用。
The first digital computers, which appeared in the late 1940s and early 1950s, were invented and used for scientific applications. Typically, the scientific applications of that time used relatively simple data structures, but required large numbers of floating-point arithmetic computations. The most common data structures were arrays and matrices; the most common control structures were counting loops and selections. The early high-level programming languages invented for scientific applications were designed to provide for those needs. Their competition was assembly language, so efficiency was a primary concern. The first language for scientific applications was Fortran. ALGOL 60 and most of its descendants were also intended to be used in this area, although they were designed to be used in related areas as well. For some scientific applications where efficiency is the primary concern, such as those that were common in the 1950s and 1960s, no subsequent language is significantly better than Fortran, which explains why Fortran is still used.
计算机在商业领域的应用始于 20 世纪 50 年代。为此目的,人们开发了专用计算机和专用语言。第一个成功的商业高级语言是 COBOL(ISO/IEC,2002),其初始版本出现于 1960 年。它可能仍然是这些应用程序最常用的语言。商业语言的特点是能够生成复杂的报告,能够精确地描述和存储十进制数字和字符数据,并且能够指定十进制算术运算。
The use of computers for business applications began in the 1950s. Special computers were developed for this purpose, along with special languages. The first successful high-level language for business was COBOL (ISO/IEC, 2002), the initial version of which appeared in 1960. It probably still is the most commonly used language for these applications. Business languages are characterized by facilities for producing elaborate reports, precise ways of describing and storing decimal numbers and character data, and the ability to specify decimal arithmetic operations.
除了 COBOL 的发展和演进之外,商业应用语言的发展很少。因此,本书仅对 COBOL 中的结构进行了有限的讨论。
There have been few developments in business application languages outside the development and evolution of COBOL. Therefore, this book includes only limited discussions of the structures in COBOL.
人工智能 (AI) 是计算机应用的一个广泛领域,其特点是使用符号而不是数字计算。符号计算意味着操纵由名称而不是数字组成的符号。此外,使用数据链接列表而不是数组更方便地完成符号计算。这种编程有时需要比其他编程领域更大的灵活性。例如,在某些 AI 应用程序中,在执行期间创建和执行代码段的能力很方便。
Artificial intelligence (AI) is a broad area of computer applications characterized by the use of symbolic rather than numeric computations. Symbolic computation means that symbols, consisting of names rather than numbers, are manipulated. Also, symbolic computation is more conveniently done with linked lists of data rather than arrays. This kind of programming sometimes requires more flexibility than other programming domains. For example, in some AI applications the ability to create and execute code segments during execution is convenient.
第一个被广泛使用的为人工智能应用开发的编程语言是 1959 年出现的函数式语言 Lisp(McCarthy 等人,1965 年)。1990 年之前开发的大多数人工智能应用程序都是用 Lisp 或其近亲之一编写的。然而,在 20 世纪 70 年代初期,出现了一种针对其中一些应用程序的替代方法——使用 Prolog(Clocksin 和 Mellish,2013 年)语言进行逻辑编程。最近,一些人工智能应用程序是用 Python(Lutz,2013 年)等系统语言编写的。第 15章和第 16章分别介绍了 Lisp 的一种方言Scheme(Dybvig,2011 年)和 Prolog 。
The first widely used programming language developed for AI applications was the functional language Lisp (McCarthy et al., 1965), which appeared in 1959. Most AI applications developed prior to 1990 were written in Lisp or one of its close relatives. During the early 1970s, however, an alternative approach to some of these applications appeared—logic programming using the Prolog (Clocksin and Mellish, 2013) language. More recently, some AI applications have been written in systems languages such as Python (Lutz, 2013). Scheme (Dybvig, 2011), a dialect of Lisp, and Prolog are introduced in Chapters 15 and 16, respectively.
万维网由多种语言支持,从标记语言(如 HTML,它不是编程语言)到通用编程语言(如 Java)。由于对动态 Web 内容的普遍需求,内容呈现技术中通常包含一些计算能力。此功能可以通过在 HTML 文档中嵌入编程代码来提供。此类代码通常采用脚本语言的形式,例如 JavaScript(Flanagan,2011 年)或 PHP(Tatroe 等,2013 年)。还有一些标记类语言已扩展为包含控制文档处理的结构,这些语言将在第 1.5节和第 2章 中讨论。
The World Wide Web is supported by an eclectic collection of languages, ranging from markup languages, such as HTML, which is not a programming language, to general-purpose programming languages, such as Java. Because of the pervasive need for dynamic Web content, some computation capability is often included in the technology of content presentation. This functionality can be provided by embedding programming code in an HTML document. Such code is often in the form of a scripting language, such as JavaScript (Flanagan, 2011) or PHP (Tatroe et al., 2013). There are also some markup-like languages that have been extended to include constructs that control document processing, which are discussed in Section 1.5 and in Chapter 2.
如前所述,本书的目的是仔细研究编程语言各种结构和功能的基本概念。我们还将评估这些特性,重点关注它们对软件开发过程(包括维护)的影响。为此,我们需要一套评估标准。这样的标准列表必然是有争议的,因为很难让两位计算机科学家就某个给定语言特性相对于其他特性的价值达成一致。尽管存在这些差异,但大多数人都会同意以下小节中讨论的标准很重要。
As noted previously, the purpose of this book is to examine carefully the underlying concepts of the various constructs and capabilities of programming languages. We will also evaluate these features, focusing on their impact on the software development process, including maintenance. To do this, we need a set of evaluation criteria. Such a list of criteria is necessarily controversial, because it is difficult to get even two computer scientists to agree on the value of some given language characteristic relative to others. In spite of these differences, most would agree that the criteria discussed in the following subsections are important.
表 1.1列出了影响这四个最重要标准中的三个的标准的一些特征,标准本身将在以下各节中讨论。2请注意,表中仅包含最重要的特征,这与以下小节中的讨论一致。人们可能会认为,如果考虑不太重要的特征,几乎所有表格位置都可以包含“项目符号”。
Some of the characteristics that influence three of the four most important of these criteria are shown in Table 1.1, and the criteria themselves are discussed in the following sections.2 Note that only the most important characteristics are included in the table, mirroring the discussion in the following subsections. One could probably make the case that if one considered less important characteristics, virtually all table positions could include “bullets.”
请注意,其中一些特征很宽泛,有些模糊,例如可写性,而其他特征则是特定的语言构造,例如异常处理。此外,尽管讨论似乎暗示这些标准具有同等重要性,但这种暗示并非有意为之,而且显然并非如此。
Note that some of these characteristics are broad and somewhat vague, such as writability, whereas others are specific language constructs, such as exception handling. Furthermore, although the discussion might seem to imply that the criteria have equal importance, that implication is not intended, and it is clearly not the case.
评判编程语言最重要的标准之一是程序的易读性和易理解性。1970 年之前,软件开发主要被认为是编写代码。编程语言的一个积极特性是效率。语言结构更多地是从计算机的角度而不是计算机用户的角度设计的。然而,在 20 世纪 70 年代,软件生命周期概念(Booch,1987)得到了发展;编码被降级为一个较小的角色,而维护被认为是周期的主要部分,特别是在成本方面。因为维护的难易程度在很大程度上取决于程序的可读性,所以可读性成为程序和编程语言质量的重要标准。这是编程语言进化的一个重要转折点。从关注机器到关注人有一个明显的转变。
One of the most important criteria for judging a programming language is the ease with which programs can be read and understood. Before 1970, software development was largely thought of in terms of writing code. The primary positive characteristic of programming languages was efficiency. Language constructs were designed more from the point of view of the computer than of the computer users. In the 1970s, however, the software life-cycle concept (Booch, 1987) was developed; coding was relegated to a much smaller role, and maintenance was recognized as a major part of the cycle, particularly in terms of cost. Because ease of maintenance is determined in large part by the readability of programs, readability became an important measure of the quality of programs and programming languages. This was an important juncture in the evolution of programming languages. There was a distinct crossover from a focus on machine orientation to a focus on human orientation.
可读性必须结合问题领域来考虑。例如,如果一个描述计算的程序是用一种不适用于此类用途的语言编写的,那么该程序可能不自然且复杂,从而很难阅读。
Readability must be considered in the context of the problem domain. For example, if a program that describes a computation is written in a language not designed for such use, the program may be unnatural and convoluted, making it unusually difficult to read.
以下小节描述了有助于提高编程语言可读性的特征。
The following subsections describe characteristics that contribute to the readability of a programming language.
编程语言的整体简单性会极大地影响其可读性。具有大量基本构造的语言比基本构造较少的语言更难学习。必须使用大型语言的程序员通常会学习该语言的一个子集,而忽略其其他功能。这种学习模式有时被用来为语言构造数量过多辩解,但这种说法并不成立。只要程序作者学习的子集与读者熟悉的子集不同,就会出现可读性问题。
The overall simplicity of a programming language strongly affects its readability. A language with a large number of basic constructs is more difficult to learn than one with a smaller number. Programmers who must use a large language often learn a subset of the language and ignore its other features. This learning pattern is sometimes used to excuse the large number of language constructs, but that argument is not valid. Readability problems occur whenever the program’s author has learned a different subset from that subset with which the reader is familiar.
编程语言的第二个复杂特征是功能多样性——即有多种方法来完成特定操作。例如,在 Java 中,用户可以通过四种不同的方式增加一个简单的整数变量:
A second complicating characteristic of a programming language is feature multiplicity—that is, having more than one way to accomplish a particular operation. For example, in Java, a user can increment a simple integer variable in four different ways:
count = count + 1
count += 1
count++
++count
count = count + 1
count += 1
count++
++count
虽然最后两个句子的含义在某些情况下彼此略有不同,并且与其他句子的含义也略有不同,但它们在用作独立表达时都具有相同的含义。这些变体将在第7章 中讨论。
Although the last two statements have slightly different meanings from each other and from the others in some contexts, all of them have the same meaning when used as stand-alone expressions. These variations are discussed in Chapter 7.
第三个潜在问题是运算符重载,即单个运算符符号具有多个含义。虽然这通常很有用,但如果允许用户创建自己的重载并且不明智地这样做,则会导致可读性降低。例如,显然可以重载 + 以将其用于整数和浮点加法。事实上,这种重载通过减少运算符的数量来简化语言。但是,假设程序员定义单维数组操作数之间的 + 表示两个数组所有元素的总和。由于向量加法的通常含义与此完全不同,这种不寻常的含义可能会让作者和程序的读者感到困惑。程序混淆的一个更极端的例子是用户定义两个向量操作数之间的 + 表示它们各自第一个元素之间的差。第7章 将进一步讨论运算符重载。
A third potential problem is operator overloading, in which a single operator symbol has more than one meaning. Although this is often useful, it can lead to reduced readability if users are allowed to create their own overloading and do not do it sensibly. For example, it is clearly acceptable to overload + to use it for both integer and floating-point addition. In fact, this overloading simplifies a language by reducing the number of operators. However, suppose the programmer defined + used between single-dimensioned array operands to mean the sum of all elements of both arrays. Because the usual meaning of vector addition is quite different from this, this unusual meaning could confuse both the author and the program’s readers. An even more extreme example of program confusion would be a user defining + between two vector operands to mean the difference between their respective first elements. Operator overloading is further discussed in Chapter 7.
当然,语言的简单性也可能被过度追求。例如,大多数汇编语言语句的形式和含义都是简单的典范,正如您在考虑下一节中出现的语句时所看到的那样。然而,正是这种简单性使得汇编语言程序的可读性较差。由于它们缺乏更复杂的控制语句,程序结构不太明显;由于语句简单,因此需要的语句比高级语言中的等效程序多得多。这些论点同样适用于控制和数据结构不充分的高级语言的不太极端的情况。
Simplicity in languages can, of course, be carried too far. For example, the form and meaning of most assembly language statements are models of simplicity, as you can see when you consider the statements that appear in the next section. This very simplicity, however, makes assembly language programs less readable. Because they lack more complex control statements, program structure is less obvious; because the statements are simple, far more of them are required than in equivalent programs in a high-level language. These same arguments apply to the less extreme case of high-level languages with inadequate control and data-structuring constructs.
编程语言中的正交性意味着可以用相对较少的方式组合一组相对较小的原始构造来构建语言的控制和数据结构。此外,每种可能的原始组合都是合法且有意义的。例如,考虑数据类型。假设一种语言有四种原始数据类型(整数、浮点数、双精度和字符)和两种类型运算符(数组和指针)。如果这两种类型运算符可以应用于它们自身和四种原始数据类型,则可以定义大量的数据结构。
Orthogonality in a programming language means that a relatively small set of primitive constructs can be combined in a relatively small number of ways to build the control and data structures of the language. Furthermore, every possible combination of primitives is legal and meaningful. For example, consider data types. Suppose a language has four primitive data types (integer, float, double, and character) and two type operators (array and pointer). If the two type operators can be applied to themselves and the four primitive data types, a large number of data structures can be defined.
正交语言特征的含义与其在程序中出现的上下文无关。(正交一词来自数学概念正交向量,它们彼此独立。)正交性源于基元之间关系的对称性。缺乏正交性会导致语言规则的例外。例如,在支持指针的编程语言中,应该可以定义一个指针来指向语言中定义的任何特定类型。但是,如果不允许指针指向数组,则无法定义许多可能有用的用户定义数据结构。
The meaning of an orthogonal language feature is independent of the context of its appearance in a program. (The word orthogonal comes from the mathematical concept of orthogonal vectors, which are independent of each other.) Orthogonality follows from a symmetry of relationships among primitives. A lack of orthogonality leads to exceptions to the rules of the language. For example, in a programming language that supports pointers, it should be possible to define a pointer to point to any specific type defined in the language. However, if pointers are not allowed to point to arrays, many potentially useful user-defined data structures cannot be defined.
我们可以通过比较 IBM 大型计算机和 VAX 系列小型计算机的汇编语言的一个方面来说明正交性作为一种设计概念的使用。我们考虑一个简单的情况:将两个驻留在内存或寄存器中的 32 位整数值相加,并用总和替换其中一个值。IBM 大型计算机有两条用于此目的的指令,其形式为
We can illustrate the use of orthogonality as a design concept by comparing one aspect of the assembly languages of the IBM mainframe computers and the VAX series of minicomputers. We consider a single simple situation: adding two 32-bit integer values that reside in either memory or registers and replacing one of the two values with the sum. The IBM mainframes have two instructions for this purpose, which have the forms
A Reg1, memory_cell
AR Reg1, Reg2
A Reg1, memory_cell
AR Reg1, Reg2
其中Reg1和Reg2代表寄存器。它们的语义是
where Reg1 and Reg2 represent registers. The semantics of these are
Reg1 ← contents(Reg1) + contents(memory_cell)
Reg1 ← contents(Reg1) + contents(Reg2)
Reg1 ← contents(Reg1) + contents(memory_cell)
Reg1 ← contents(Reg1) + contents(Reg2)
32 位整数值的 VAX 加法指令是
The VAX addition instruction for 32-bit integer values is
ADDL operand_1, operand_2ADDL operand_1, operand_2
其语义是
whose semantics is
operand_2 ← contents(operand_1) + contents(operand_2)operand_2 ← contents(operand_1) + contents(operand_2)
在这种情况下,操作数可以是寄存器或存储单元。
In this case, either operand can be a register or a memory cell.
VAX 指令设计是正交的,因为单个指令可以使用寄存器或内存单元作为其操作数。有两种指定操作数的方法,可以以所有可能的方式组合。IBM 设计不正交。四种操作数组合可能性中只有两种是合法的,并且这两种组合需要不同的指令A和AR。IBM 设计受到更多限制,因此可写性更差。例如,您不能将两个值相加并将和存储在内存位置中。此外,由于限制和附加指令,IBM 设计更难学习。
The VAX instruction design is orthogonal in that a single instruction can use either registers or memory cells as its operands. There are two ways to specify operands, which can be combined in all possible ways. The IBM design is not orthogonal. Only two out of four operand combinations possibilities are legal, and the two require different instructions, A and AR. The IBM design is more restricted and therefore less writable. For example, you cannot add two values and store the sum in a memory location. Furthermore, the IBM design is more difficult to learn because of the restrictions and the additional instruction.
正交性与简单性密切相关:语言设计越正交,语言规则所需的例外就越少。例外越少意味着设计中的规则性程度越高,这使得语言更容易学习、阅读和理解。任何学过相当一部分英语的人都可以证明学习英语的许多规则例外是多么困难(例如,i在e之前,但在c之后)。
Orthogonality is closely related to simplicity: The more orthogonal the design of a language, the fewer exceptions the language rules require. Fewer exceptions mean a higher degree of regularity in the design, which makes the language easier to learn, read, and understand. Anyone who has learned a significant part of the English language can testify to the difficulty of learning its many rule exceptions (for example, i before e except after c).
作为高级语言缺乏正交性的例子,请考虑 C 中的以下规则和例外。尽管 C 有两种结构化数据类型,即数组和记录 ( structs),但记录可以从函数返回,而数组不能。结构的成员可以是除void或相同类型的结构之外的任何数据类型。数组元素可以是除void或函数之外的任何数据类型。参数按值传递,除非它们是数组,在这种情况下,它们实际上是通过引用传递的(因为在 C 程序中,没有下标的数组名称的出现被解释为数组第一个元素的地址)。
As examples of the lack of orthogonality in a high-level language, consider the following rules and exceptions in C. Although C has two kinds of structured data types, arrays and records (structs), records can be returned from functions but arrays cannot. A member of a structure can be any data type except void or a structure of the same type. An array element can be any data type except void or a function. Parameters are passed by value, unless they are arrays, in which case they are, in effect, passed by reference (because the appearance of an array name without a subscript in a C program is interpreted to be the address of the array’s first element).
作为上下文依赖性的一个例子,考虑 C 表达式
As an example of context dependence, consider the C expression
a + ba + b
a这个表达式通常意味着获取和的值b并将其相加。但是,如果a恰好是指针并且b是整数,则会影响的值b。例如,如果a指向占用四个字节的浮点值,则b必须先缩放值(在本例中乘以 4),然后才能将其转换为整数。添加到a。因此 的类型a影响 的值的处理b。 的上下文b影响其含义。
This expression often means that the values of a and b are fetched and added together. However, if a happens to be a pointer and b is an integer, it affects the value of b. For example, if a points to a float value that occupies four bytes, then the value of b must be scaled—in this case multiplied by 4—before it is added to a. Therefore, the type of a affects the treatment of the value of b. The context of b affects its meaning.
正交性过高也会导致问题。正交性最强的编程语言可能就是 ALGOL 68(van Wijngaarden et al., 1969)。ALGOL 68 中的每个语言结构都有一个类型,并且对这些类型没有任何限制。此外,大多数结构都会产生值。这种组合自由允许使用极其复杂的结构。例如,只要结果是一个地址,条件语句就可以与声明和其他各种语句一起出现在赋值的左侧。这种极端形式的正交性会导致不必要的复杂性。此外,由于语言需要大量原语,因此高度的正交性会导致组合激增。因此,即使组合很简单,其数量之多也会导致复杂性。
Too much orthogonality can also cause problems. Perhaps the most orthogonal programming language is ALGOL 68 (van Wijngaarden et al., 1969). Every language construct in ALGOL 68 has a type, and there are no restrictions on those types. In addition, most constructs produce values. This combinational freedom allows extremely complex constructs. For example, a conditional can appear as the left side of an assignment, along with declarations and other assorted statements, as long as the result is an address. This extreme form of orthogonality leads to unnecessary complexity. Furthermore, because languages require a large number of primitives, a high degree of orthogonality results in an explosion of combinations. So, even if the combinations are simple, their sheer numbers lead to complexity.
因此,语言的简单性至少在一定程度上是相对较少数量的原始结构与有限使用正交性概念相结合的结果。
Simplicity in a language, therefore, is at least in part the result of a combination of a relatively small number of primitive constructs and a limited use of the concept of orthogonality.
有些人认为函数式语言将简单性和正交性完美地结合在一起。函数式语言(例如 Lisp)主要通过将函数应用于给定参数来进行计算。相比之下,在命令式语言(例如 C、C++ 和 Java)中,计算通常使用变量和赋值语句来指定。函数式语言可能具有最大的整体简单性,因为它们可以用一个构造(函数调用)完成所有操作,并且可以简单地与其他函数调用组合。这种简单优雅是一些语言研究人员被函数式语言吸引的原因,它是 Java 等复杂非函数式语言的主要替代方案。然而,其他因素(其中最重要的可能是效率)阻碍了函数式语言得到更广泛的应用。
Some believe that functional languages offer a good combination of simplicity and orthogonality. A functional language, such as Lisp, is one in which computations are made primarily by applying functions to given parameters. In contrast, in imperative languages such as C, C++, and Java, computations are usually specified with variables and assignment statements. Functional languages offer potentially the greatest overall simplicity, because they can accomplish everything with a single construct, the function call, which can be combined simply with other function calls. This simple elegance is the reason why some language researchers are attracted to functional languages as the primary alternative to complex nonfunctional languages such as Java. Other factors, the most important of which is probably efficiency, however, have prevented functional languages from becoming more widely used.
在语言中,定义数据类型和数据结构的适当功能是提高可读性的另一个重要助力。例如,假设指示标志使用数字类型,因为语言中没有布尔类型。在这样的语言中,例如在 C 的原始版本中,我们可能会有如下赋值:
The presence of adequate facilities for defining data types and data structures in a language is another significant aid to readability. For example, suppose a numeric type is used for an indicator flag because there is no Boolean type in the language. In such a language, for example, in the original version of C, we might have an assignment such as the following:
timeout = 1timeout = 1
该语句的含义不清楚,而在包含布尔类型的语言中,我们会有以下内容:
The meaning of this statement is unclear, whereas in a language that includes Boolean types, we would have the following:
timeout = true timeout = true
这句话的意思非常清楚。
The meaning of this statement is perfectly clear.
语言元素的语法或形式对程序的可读性有显著影响。以下是影响可读性的一些语法设计选择示例:
The syntax, or form, of the elements of a language has a significant effect on the readability of programs. Following are some examples of syntactic design choices that affect readability:
特殊词。while程序的外观和可读性在很大程度上受到语言特殊词(例如, 、class和)形式的影响for。形成复合语句或语句组的方法尤其重要,主要是在控制结构中。一些语言使用成对的特殊词或符号来形成语句组。C 及其后代使用括号来指定复合语句。所有这些语言的可读性都降低了,因为语句组总是以相同的方式终止,这使得当出现end或 右括号时很难确定哪个组正在结束。Fortran 95 和 Ada(ISO/IEC,2014)通过对每种类型的语句组使用不同的结束语法来更清楚地说明这一点。例如,Ada 使用end if来终止选择结构和end loop来终止循环结构。这是一个冲突的例子,一方面是简单性导致保留字较少(如 Java 中),另一方面是使用更多保留字可以提高可读性(如 Ada 中)。
另一个重要问题是语言中的特殊词是否可以用作程序变量的名称。如果可以,那么生成的程序可能会非常混乱。例如,在 Fortran 95 中,特殊词(如Do和End)是合法的变量名,因此这些词在程序中的出现可能意味着或可能不意味着某些特殊的东西。
Special words. Program appearance and thus program readability are strongly influenced by the forms of a language’s special words (for example, while, class, and for). Especially important is the method of forming compound statements, or statement groups, primarily in control constructs. Some languages have used matching pairs of special words or symbols to form groups. C and its descendants use braces to specify compound statements. All of these languages have diminished readability because statement groups are always terminated in the same way, which makes it difficult to determine which group is being ended when an end or a right brace appears. Fortran 95 and Ada (ISO/IEC, 2014) make this clearer by using a distinct closing syntax for each type of statement group. For example, Ada uses end if to terminate a selection construct and end loop to terminate a loop construct. This is an example of the conflict between simplicity that results in fewer reserved words, as in Java, and the greater readability that can result from using more reserved words, as in Ada.
Another important issue is whether the special words of a language can be used as names for program variables. If so, then the resulting programs can be very confusing. For example, in Fortran 95, special words, such as Do and End, are legal variable names, so the appearance of these words in a program may or may not connote something special.
形式和含义。设计语句时,使其外观至少部分表明其目的,这显然有助于提高可读性。语义或含义应直接遵循语法或形式。在某些情况下,两个外观相同或相似但含义不同(可能取决于上下文)的语言结构违反了这一原则。例如,在 C 语言中,保留字的含义static取决于其出现的上下文。如果在函数内部的变量定义中使用,则表示该变量是在编译时创建的。如果在所有函数之外的变量定义中使用,则表示该变量仅在其定义出现的文件中可见;也就是说,它不会从该文件导出。
对 UNIX 的 shell 命令的主要抱怨之一(Robbins,2005)是它们的外观并不总是表明它们的功能。例如,UNIX 命令的含义只能grep通过先前的知识,或者可能是聪明才智和对 UNIX 编辑器的熟悉才能解读。对于 UNIX 初学者来说,ed的外观没有任何意义。(在 中,命令/regular_expression / 搜索与正则表达式匹配的子字符串。在其前面加上使其成为全局命令,指定搜索范围是正在编辑的整个文件。在命令后面加上指定要打印包含匹配子字符串的行。因此,/regular_expression / (显然可以缩写为)打印文件中包含与其操作数(即正则表达式)匹配的子字符串的所有行。)grepedgpgpgrep
Form and meaning. Designing statements so that their appearance at least partially indicates their purpose is an obvious aid to readability. Semantics, or meaning, should follow directly from syntax, or form. In some cases, this principle is violated by two language constructs that are identical or similar in appearance but have different meanings, depending perhaps on context. In C, for example, the meaning of the reserved word static depends on the context of its appearance. If used on the definition of a variable inside a function, it means the variable is created at compile time. If used on the definition of a variable that is outside all functions, then it means the variable is visible only in the file in which its definition appears; that is, it is not exported from that file.
One of the primary complaints about the shell commands of UNIX (Robbins, 2005) is that their appearance does no t always suggest their function. For example, the meaning of the UNIX command grep can be deciphered only through prior knowledge, or perhaps cleverness and familiarity with the UNIX editor, ed. The appearance of grep connotes nothing to UNIX beginners. (In ed, the command /regular_expression/ searches for a substring that matches the regular expression. Preceding this with g makes it a global command, specifying that the scope of the search is the whole file being edited. Following the command with p specifies that lines with the matching substring are to be printed. So g/regular_expression/p, which can obviously be abbreviated as grep, prints all lines in a file that contain substrings that match its operand, which is a regular expression.)
可写性是衡量语言用于为特定问题领域编写程序的难易程度的标准。影响可读性的大多数语言特性也会影响可写性。这直接源于这样一个事实:编写程序的过程要求程序员频繁地重读已经编写好的程序部分。
Writability is a measure of how easily a language can be used to create programs for a chosen problem domain. Most of the language characteristics that affect readability also affect writability. This follows directly from the fact that the process of writing a program requires the programmer frequently to reread the part of the program that is already written.
与可读性一样,可写性必须结合语言的目标问题领域来考虑。如果一种语言是为特定应用而设计的,而另一种语言不是,那么在特定应用领域比较两种语言的可写性显然是不公平的。例如,对于创建具有图形用户界面 (GUI) 的程序,Visual BASIC (VB) (Halvorson,2013) 和 C 的可写性存在巨大差异,VB 是理想的选择。对于编写系统程序(例如操作系统),它们的可写性也大不相同,而 C 是为操作系统而设计的。
As is the case with readability, writability must be considered in the context of the target problem domain of a language. It simply is not fair to compare the writability of two languages in the realm of a particular application when one was designed for that application and the other was not. For example, the writabilities of Visual BASIC (VB) (Halvorson, 2013) and C are dramatically different for creating a program that has a graphical user interface (GUI), for which VB is ideal. Their writabilities are also quite different for writing systems programs, such as an operating system, for which C was designed.
以下小节描述了影响语言可写性的最重要特征。
The following subsections describe the most important characteristics influencing the writability of a language.
如果一种语言有大量不同的构造,使用该语言的某些程序员可能并不熟悉所有这些构造。这种情况可能会导致某些特性被误用,而其他特性可能比实际使用的特性更优雅或更高效,或两者兼而有之。正如Hoare (1973)所指出的,甚至可能会意外使用未知特性,从而产生奇怪的结果。因此,较少数量的原始构造和一组用于组合它们的一致规则(即正交性)比仅仅拥有大量原始构造要好得多。程序员只需学习一组简单的原始构造,就可以设计出复杂问题的解决方案。
If a language has a large number of different constructs, some programmers who use the language might not be familiar with all of them. This situation can lead to a misuse of some features and a disuse of others that may be either more elegant or more efficient, or both, than those that are used. It may even be possible, as noted by Hoare (1973), to use unknown features accidentally, with bizarre results. Therefore, a smaller number of primitive constructs and a consistent set of rules for combining them (that is, orthogonality) is much better than simply having a large number of primitives. A programmer can design a solution to a complex problem after learning only a simple set of primitive constructs.
另一方面,过多的正交性可能会损害可写性。当几乎任何原语组合都是合法时,程序中的错误可能无法检测到。这可能导致编译器无法发现的代码荒谬性。
On the other hand, too much orthogonality can be a detriment to writability. Errors in programs can go undetected when nearly any combination of primitives is legal. This can lead to code absurdities that cannot be discovered by the compiler.
语言的表现力可指多种不同的特性。在 APL(Gilman 和 Rose,1983)等语言中,它意味着非常强大的运算符,允许使用非常小的程序完成大量计算。更常见的是,它意味着语言具有相对方便而非繁琐的计算指定方式。例如,在 C 语言中,符号比更方便、更count++简短count = count + 1。此外,and thenAda 中的布尔运算符是指定布尔表达式短路求值的一种便捷方式。在 Java 中包含语句for使编写计数循环比使用更容易while,这也是可能的。所有这些都提高了语言的可写性。
Expressivity in a language can refer to several different characteristics. In a language such as APL (Gilman and Rose, 1983), it means that there are very powerful operators that allow a great deal of computation to be accomplished with a very small program. More commonly, it means that a language has relatively convenient, rather than cumbersome, ways of specifying computations. For example, in C, the notation count++ is more convenient and shorter than count = count + 1. Also, the and then Boolean operator in Ada is a convenient way of specifying short-circuit evaluation of a Boolean expression. The inclusion of the for statement in Java makes writing counting loops easier than with the use of while, which is also possible. All of these increase the writability of a language.
如果一个程序在所有条件下都能按照其规范执行,则该程序被认为是可靠的。以下小节描述了几种对给定语言中程序的可靠性有重大影响的语言特性。
A program is said to be reliable if it performs to its specifications under all conditions. The following subsections describe several language features that have a significant effect on the reliability of programs in a given language.
类型检查只是测试给定程序中的类型错误,可以由编译器进行,也可以在程序执行期间进行。类型检查是语言可靠性的一个重要因素。由于运行时类型检查的代价高昂,因此编译时类型检查更为可取。此外,越早发现程序中的错误,进行所需修复的代价就越低。Java 的设计要求在编译时检查几乎所有变量和表达式的类型。这实际上消除了 Java 程序在运行时的类型错误。第6章 将深入讨论类型和类型检查。
Type checking is simply testing for type errors in a given program, either by the compiler or during program execution. Type checking is an important factor in language reliability. Because run-time type checking is expensive, compile-time type checking is more desirable. Furthermore, the earlier errors in programs are detected, the less expensive it is to make the required repairs. The design of Java requires checks of the types of nearly all variables and expressions at compile time. This virtually eliminates type errors at run time in Java programs. Types and type checking are discussed in depth in Chapter 6.
一个示例是,在编译时或运行时未能进行类型检查会导致无数程序错误,即原始 C 语言中子程序参数的使用(Kernighan and Ritchie,1978)。在这种语言中,函数调用中实际参数的类型没有经过检查,无法确定其类型是否与函数中相应的形式参数的类型相匹配。类型int变量可以用作函数调用中的实际参数,而该函数需要类型float作为其形式参数,编译器和运行时系统都无法检测到这种不一致。例如,由于表示整数 23 的位串与表示浮点数 23 的位串本质上无关,因此如果将整数 23 发送给需要浮点参数的函数,则函数中使用该参数的任何行为都将产生无意义的结果。此外,这类问题通常很难诊断。3当前版本的 C 要求对所有参数进行类型检查,从而消除了这个问题。第9章 讨论了子程序和参数传递技术。
One example of how failure to type check, at either compile time or run time, has led to countless program errors is the use of subprogram parameters in the original C language (Kernighan and Ritchie, 1978). In this language, the type of an actual parameter in a function call was not checked to determine whether its type matched that of the corresponding formal parameter in the function. An int type variable could be used as an actual parameter in a call to a function that expected a float type as its formal parameter, and neither the compiler nor the run-time system would detect the inconsistency. For example, because the bit string that represents the integer 23 is essentially unrelated to the bit string that represents a floating-point 23, if an integer 23 is sent to a function that expects a floating-point parameter, any uses of the parameter in the function will produce nonsense. Furthermore, such problems are often difficult to diagnose.3 The current version of C has eliminated this problem by requiring all parameters to be type checked. Subprograms and parameter-passing techniques are discussed in Chapter 9.
程序能够拦截运行时错误(以及程序可检测到的其他异常情况),采取纠正措施,然后继续运行,这显然有助于提高可靠性。这种语言功能称为异常处理。Ada、C++、Java 和 C# 包含广泛的异常处理功能,但在某些广泛使用的语言(例如 C)中,此类功能实际上并不存在。第14章 讨论了异常处理。
The ability of a program to intercept run-time errors (as well as other unusual conditions detectable by the program), take corrective measures, and then continue is an obvious aid to reliability. This language facility is called exception handling. Ada, C++, Java, and C# include extensive capabilities for exception handling, but such facilities are practically nonexistent in some widely used languages, for example, C. Exception handling is discussed in Chapter 14.
广义上讲,别名是指程序中有两个或多个不同的名称,可用于访问同一个内存单元。现在人们普遍认为别名是编程语言中的一个危险特性。大多数编程语言允许某种类型的别名 - 例如,两个指针(或引用)设置为指向同一个变量,这在大多数语言中都是可能的。在这样的程序中,程序员必须始终记住,更改其中一个指针指向的值会更改另一个指针引用的值。某些类型的别名(如第5章和第9章 所述)可能因语言设计而受到禁止。
Loosely defined, aliasing is having two or more distinct names in a program that can be used to access the same memory cell. It is now generally accepted that aliasing is a dangerous feature in a programming language. Most programming languages allow some kind of aliasing—for example, two pointers (or references) set to point to the same variable, which is possible in most languages. In such a program, the programmer must always remember that changing the value pointed to by one of the two changes the value referenced by the other. Some kinds of aliasing, as described in Chapters 5 and 9, can be prohibited by the design of a language.
在某些语言中,别名用于克服语言数据抽象功能的缺陷。其他语言则严格限制别名以提高其可靠性。
In some languages, aliasing is used to overcome deficiencies in the language’s data abstraction facilities. Other languages greatly restrict aliasing to increase their reliability.
可读性和可写性都会影响可靠性。如果程序是用不支持自然方式表达所需算法的语言编写的,那么它必然会使用非自然方法。非自然方法不太可能适用于所有可能的情况。程序越容易编写,就越有可能正确。
Both readability and writability influence reliability. A program written in a language that does not support natural ways to express the required algorithms will necessarily use unnatural approaches. Unnatural approaches are less likely to be correct for all possible situations. The easier a program is to write, the more likely it is to be correct.
可读性影响生命周期编写和维护阶段的可靠性。难以阅读的程序很难编写和修改。
Readability affects reliability in both the writing and maintenance phases of the life cycle. Programs that are difficult to read are difficult both to write and to modify.
编程语言的总成本取决于它的许多特性。
The total cost of a programming language is a function of many of its characteristics.
首先,培训程序员使用该语言需要花费成本,而这要取决于语言的简单性和正交性以及程序员的经验。虽然功能更强大的语言并不一定更难学习,但通常确实如此。
First, there is the cost of training programmers to use the language, which is a function of the simplicity and orthogonality of the language and the experience of the programmers. Although more powerful languages are not necessarily more difficult to learn, they often are.
其次,用该语言编写程序的成本较高。这是语言可写性的一个函数,在一定程度上取决于其目的与特定应用程序的接近程度。最初设计和实现高级语言的努力是出于降低软件开发成本的愿望。
Second, there is the cost of writing programs in the language. This is a function of the writability of the language, which depends in part on its closeness in purpose to the particular application. The original efforts to design and implement high-level languages were driven by the desire to lower the costs of creating software.
在良好的编程环境中,培训程序员的成本和用语言编写程序的成本都可以大大降低。编程环境将在第1.8节 中讨论。
Both the cost of training programmers and the cost of writing programs in a language can be significantly reduced in a good programming environment. Programming environments are discussed in Section 1.8.
第三,执行用某种语言编写的程序的成本在很大程度上受到该语言设计的影响。无论编译器的质量如何,需要进行大量运行时类型检查的语言都会阻碍代码的快速执行。虽然执行效率是早期语言设计中最重要的考虑因素,但现在人们认为它不那么重要了。
Third, the cost of executing programs written in a language is greatly influenced by that language’s design. A language that requires many run-time type checks will prohibit fast code execution, regardless of the quality of the compiler. Although execution efficiency was the foremost concern in the design of early languages, it is now considered to be less important.
可以在编译成本和编译代码的执行速度之间进行简单的权衡。优化是指编译器可能用来减小其生成的代码的大小和/或提高其执行速度的一系列技术。如果进行很少或没有进行优化,编译速度会比进行大量优化要快得多生成优化的代码。两种方案之间的选择受编译器使用环境的影响。在面向初学编程学生的实验室中,这些学生通常在开发过程中多次编译他们的程序,但执行时间却很少(他们的程序很小,而且只需正确执行一次),因此应该很少或根本不进行优化。在生产环境中,编译后的程序在开发后会多次执行,因此最好付出额外的成本来优化代码。
A simple trade-off can be made between compilation cost and execution speed of the compiled code. Optimization is the name given to the collection of techniques that compilers may use to decrease the size and/or increase the execution speed of the code they produce. If little or no optimization is done, compilation can be done much faster than if a significant effort is made to produce optimized code. The choice between the two alternatives is influenced by the environment in which the compiler will be used. In a laboratory for beginning programming students, who often compile their programs many times during development but use little execution time (their programs are small and they must execute correctly only once), little or no optimization should be done. In a production environment, where compiled programs are executed many times after development, it is better to pay the extra cost to optimize the code.
第四,可靠性差的成本很高。如果关键系统(如核电站或医用X光机)的软件出现故障,成本可能非常高。非关键系统出现故障,也可能造成未来业务损失或因软件系统缺陷而引发诉讼,成本也非常高昂。
Fourth, there is the cost of poor reliability. If the software fails in a critical system, such as a nuclear power plant or an X-ray machine for medical use, the cost could be very high. The failures of noncritical systems can also be very expensive in terms of lost future business or lawsuits over defective software systems.
最后要考虑的是维护程序的成本,包括更正和修改以添加新功能。软件维护的成本取决于多种语言特性,主要是可读性。由于维护通常由软件原作者以外的个人完成,因此可读性差会使任务极具挑战性。
The final consideration is the cost of maintaining programs, which includes both corrections and modifications to add new functionality. The cost of software maintenance depends on a number of language characteristics, primarily readability. Because maintenance is often done by individuals other than the original author of the software, poor readability can make the task extremely challenging.
软件可维护性的重要性怎么强调也不为过。据估计,对于寿命相对较长的大型软件系统,维护成本可能高达开发成本的两到四倍(Sommerville,2010)。
The importance of software maintainability cannot be overstated. It has been estimated that for large software systems with relatively long lifetimes, maintenance costs can be as high as two to four times as much as development costs (Sommerville, 2010).
在影响语言成本的所有因素中,最重要的三个是:程序开发、维护和可靠性。由于这三个因素是可写性和可读性的函数,因此这两个评估标准也是最重要的。
Of all the contributors to language costs, three are most important: program development, maintenance, and reliability. Because these are functions of writability and readability, these two evaluation criteria are, in turn, the most important.
当然,还有许多其他标准可用于评估编程语言。一个例子是可移植性,即程序从一种实现转移到另一种实现的难易程度。可移植性受语言标准化程度的影响最大。有些语言根本没有标准化,这使得用这些语言编写的程序很难从一种实现转移到另一种实现。在某些情况下,由于某些语言的实现现在只有单一来源,这一问题得到了缓解。标准化是一个耗时且困难的过程。一个委员会于 1989 年开始着手制定 C++ 的标准版本。该版本于 1998 年获得批准。
Of course, a number of other criteria could be used for evaluating programming languages. One example is portability, or the ease with which programs can be moved from one implementation to another. Portability is most strongly influenced by the degree of standardization of the language. Some languages are not standardized at all, making programs in these languages very difficult to move from one implementation to another. This problem is alleviated in some cases by the fact that implementations for some languages now have single sources. Standardization is a time-consuming and difficult process. A committee began work on producing a standard version of C++ in 1989. It was approved in 1998.
通用性(适用于广泛的应用)和明确性(语言官方定义文档的完整性和精确性)是另外两个标准。
Generality (the applicability to a wide range of applications) and well-definedness (the completeness and precision of the language’s official defining document) are two other criteria.
大多数标准,特别是可读性、可写性和可靠性,既没有精确的定义,也无法准确衡量。尽管如此,它们仍然是有用的概念,为编程语言的设计和评估提供了宝贵的见解。
Most criteria, particularly readability, writability, and reliability, are neither precisely defined nor accurately measurable. Nevertheless, they are useful concepts and they provide valuable insight into the design and evaluation of programming languages.
关于评估标准的最后一点说明:语言设计标准从不同的角度有不同的权重。语言实现者关心的是主要在于语言结构和特性的实现难度。语言使用者首先担心可写性,其次才是可读性。语言设计者则倾向于强调优雅性和吸引广泛使用的能力。这些特征经常相互冲突。
A final note on evaluation criteria: language design criteria are weighed differently from different perspectives. Language implementors are concerned primarily with the difficulty of implementing the constructs and features of the language. Language users are worried about writability first and readability later. Language designers are likely to emphasize elegance and the ability to attract widespread use. These characteristics often conflict with one another.
除了1.3节 中描述的因素外,还有其他几个因素影响编程语言的基本设计。其中最重要的是计算机体系结构和编程设计方法。
In addition to those factors described in Section 1.3, several other factors influence the basic design of programming languages. The most important of these are computer architecture and programming design methodologies.
计算机的基本架构对语言设计产生了深远的影响。过去 60 年来,大多数流行语言都是围绕当时流行的计算机架构设计的,这种架构称为冯·诺依曼架构,以其创始人之一约翰·冯·诺依曼(发音为“von Noyman”)的名字命名。这些语言被称为命令式语言。在冯·诺依曼计算机中,数据和程序都存储在同一内存中。执行指令的中央处理器 (CPU) 与内存是分开的。因此,指令和数据必须从内存传输或通过管道传输到 CPU。CPU 中的运算结果必须移回内存。自 20 世纪 40 年代以来,几乎所有数字计算机都是基于冯·诺依曼架构的。冯·诺依曼计算机的整体结构如图1.1 所示。
The basic architecture of computers has had a profound effect on language design. Most of the popular languages of the past 60 years have been designed around the prevalent computer architecture, called the von Neumann architecture, after one of its originators, John von Neumann (pronounced “von Noyman”). These languages are called imperative languages. In a von Neumann computer, both data and programs are stored in the same memory. The central processing unit (CPU), which executes instructions, is separate from the memory. Therefore, instructions and data must be transmitted, or piped, from memory to the CPU. Results of operations in the CPU must be moved back to memory. Nearly all digital computers built since the 1940s have been based on the von Neumann architecture. The overall structure of a von Neumann computer is shown in Figure 1.1.
由于冯·诺依曼体系结构,命令式语言的核心特征是变量(它模拟内存单元)、赋值语句(它基于管道操作)和迭代形式的重复(这是在此体系结构上实现重复的最有效方法)。表达式中的操作数从内存管道传输到 CPU,而表达式的求值结果则通过管道传输回赋值左侧所表示的内存单元。冯·诺依曼计算机上的迭代速度很快,因为指令存储在内存的相邻单元中,并且重复执行一段代码只需要一个分支指令。这种效率不鼓励使用递归进行重复,尽管递归有时更自然。
Because of the von Neumann architecture, the central features of imperative languages are variables, which model the memory cells; assignment statements, which are based on the piping operation; and the iterative form of repetition, which is the most efficient way to implement repetition on this architecture. Operands in expressions are piped from memory to the CPU, and the result of evaluating the expression is piped back to the memory cell represented by the left side of the assignment. Iteration is fast on von Neumann computers because instructions are stored in adjacent cells of memory and repeating the execution of a section of code requires only a branch instruction. This efficiency discourages the use of recursion for repetition, although recursion is sometimes more natural.
在冯·诺依曼架构计算机上,机器代码程序的执行发生在一个称为提取-执行周期的过程中。如前所述,程序驻留在内存中,但在 CPU 中执行。每个要执行的指令都必须从内存移动到处理器。下一个要执行的指令的地址保存在称为程序计数器的寄存器中。提取-执行周期可以简单地通过以下算法描述:
The execution of a machine code program on a von Neumann architecture computer occurs in a process called the fetch-execute cycle. As stated earlier, programs reside in memory but are executed in the CPU. Each instruction to be executed must be moved from memory to the processor. The address of the next instruction to be executed is maintained in a register called the program counter. The fetch-execute cycle can be simply described by the following algorithm:
初始化程序计数器
initialize the program counter
repeat永远
repeat forever
获取程序计数器指向的指令
fetch the instruction pointed to by the program counter
增加程序计数器以指向下一条指令
increment the program counter to point at the next instruction
解码指令
decode the instruction
执行指令
execute the instruction
end repeat
end repeat
算法中的“解码指令”步骤意味着检查指令以确定其指定的操作。当遇到停止指令时,程序执行终止,尽管在实际计算机上很少执行停止指令。相反,控制权从操作系统转移到用户程序以供其执行,然后在用户程序执行完成时返回操作系统。在计算机系统中,内存中可能同时存在多个用户程序,这个过程要复杂得多。
The “decode the instruction” step in the algorithm means the instruction is examined to determine what action it specifies. Program execution terminates when a stop instruction is encountered, although on an actual computer a stop instruction is rarely executed. Rather, control transfers from the operating system to a user program for its execution and then back to the operating system when the user program execution is complete. In a computer system in which more than one user program may be in memory at a given time, this process is far more complex.
如前所述,函数式语言或应用性语言是一种以将函数应用于给定参数为主要计算手段的语言。使用函数式语言进行编程时,无需使用命令式语言中使用的变量、赋值语句和迭代。尽管许多计算机科学家已经阐述了诸如 Scheme 之类的函数式语言的诸多优势,但它们不太可能取代命令式语言,除非设计出能够高效执行函数式语言程序的非冯·诺依曼计算机。在对此表示哀叹的人中,最有说服力的可能是Fortran 原版首席设计师John Backus (1978) 。
As stated earlier, a functional, or applicative, language is one in which the primary means of computation is applying functions to given parameters. Programming can be done in a functional language without the kind of variables that are used in imperative languages, without assignment statements, and without iteration. Although many computer scientists have expounded on the myriad benefits of functional languages, such as Scheme, it is unlikely that they will displace the imperative languages until a non–von Neumann computer is designed that allows efficient execution of programs in functional languages. Among those who have bemoaned this fact, perhaps the most eloquent was John Backus (1978), the principal designer of the original version of Fortran.
尽管命令式编程语言的结构是基于机器架构而不是编程语言使用者的能力和倾向而建立的,但有些人认为使用命令式语言比使用函数式语言更自然。因此,这些人认为,即使函数式程序与命令式程序一样高效,命令式编程语言的使用仍将占主导地位。
In spite of the fact that the structure of imperative programming languages is modeled on a machine architecture, rather than on the abilities and inclinations of the users of programming languages, some believe that using imperative languages is somehow more natural than using a functional language. So, these people believe that even if functional programs were as efficient as imperative programs, the use of imperative programming languages would still dominate.
20 世纪 60 年代末和 70 年代初,由结构化编程运动发起的对软件开发过程和编程语言设计的深入分析。
The late 1960s and early 1970s brought an intense analysis, begun in large part by the structured-programming movement, of both the software development process and programming language design.
进行这项研究的一个重要原因是,随着硬件成本下降和程序员成本上升,计算的主要成本从硬件转移到了软件。程序员生产力的提高相对较小。此外,计算机正在解决越来越大、越来越复杂的问题。与 20 世纪 60 年代初期一样,计算机只是简单地求解方程组来模拟卫星轨迹,而现在,人们开始为大型复杂任务编写程序,例如控制大型石油精炼设施和提供全球航空预订系统。
An important reason for this research was the shift in the major cost of computing from hardware to software, as hardware costs decreased and programmer costs increased. Increases in programmer productivity were relatively small. In addition, progressively larger and more complex problems were being solved by computers. Rather than simply solving sets of equations to simulate satellite tracks, as in the early 1960s, programs were being written for large and complex tasks, such as controlling large petroleum-refining facilities and providing worldwide airline reservation systems.
20 世纪 70 年代研究的成果是出现了新的软件开发方法,即自顶向下设计和逐步细化。发现的编程语言主要缺陷是类型检查不完整和控制语句不足(需要大量使用 goto)。
The new software development methodologies that emerged as a result of the research of the 1970s were called top-down design and stepwise refinement. The primary programming language deficiencies that were discovered were incompleteness of type checking and inadequacy of control statements (requiring the extensive use of gotos).
20 世纪 70 年代末,程序设计方法开始从面向过程转向面向数据。简单来说,面向数据的方法强调数据设计,注重使用抽象数据类型来解决问题。
In the late 1970s, a shift from procedure-oriented to data-oriented program design methodologies began. Simply put, data-oriented methods emphasize data design, focusing on the use of abstract data types to solve problems.
为了在软件系统设计中有效地使用数据抽象,它必须得到用于实现的语言的支持。第一个提供有限数据抽象支持的语言是 SIMULA 67(Birtwistle 等人,1973 年),尽管该语言当然不是因为它而流行起来的。直到 20 世纪 70 年代初,数据抽象的好处才得到广泛认可。然而,自 20 世纪 70 年代末以来设计的大多数语言都支持数据抽象,第 11章 将对此进行详细讨论。
For data abstraction to be used effectively in software system design, it must be supported by the languages used for implementation. The first language to provide even limited support for data abstraction was SIMULA 67 (Birtwistle et al., 1973), although that language certainly was not propelled to popularity because of it. The benefits of data abstraction were not widely recognized until the early 1970s. However, most languages designed since the late 1970s support data abstraction, which is discussed in detail in Chapter 11.
面向数据软件开发演进的最新一步始于 20 世纪 80 年代初,即面向对象设计。面向对象方法始于数据抽象,它用数据对象封装处理并控制对数据的访问,并添加了继承和动态方法绑定。继承是一个强大的概念,它大大增强了现有软件的潜在重用性,从而提供了显著提高软件开发效率的可能性。这是面向对象语言越来越流行的一个重要因素。动态(运行时)方法绑定允许更灵活地使用继承。
The latest step in the evolution of data-oriented software development, which began in the early 1980s, is object-oriented design. Object-oriented methodology begins with data abstraction, which encapsulates processing with data objects and controls access to data, and adds inheritance and dynamic method binding. Inheritance is a powerful concept that greatly enhances the potential reuse of existing software, thereby providing the possibility of significant increases in software development productivity. This is an important factor in the increase in popularity of object-oriented languages. Dynamic (run-time) method binding allows more flexible use of inheritance.
面向对象编程与支持其概念的语言 Smalltalk(Goldberg 和 Robson,1989 年)一起发展起来。尽管 Smalltalk 从未像许多其他语言那样得到广泛使用,但面向对象编程的支持现已成为大多数流行命令式语言的一部分,包括 Java、C++ 和 C#。面向对象概念也已应用于 CLOS(Bobrow 等人,1988 年)和 F#(Syme 等人,2010 年)中的函数式编程,以及 Prolog++(Moss,1994 年)中的逻辑编程。第12章 将详细讨论面向对象编程的语言支持。
Object-oriented programming developed along with a language that supported its concepts: Smalltalk (Goldberg and Robson, 1989). Although Smalltalk never became as widely used as many other languages, support for object-oriented programming is now part of most popular imperative languages, including Java, C++, and C#. Object-oriented concepts have also found their way into functional programming in CLOS (Bobrow et al., 1988) and F# (Syme et al., 2010), as well as logic programming in Prolog++ (Moss, 1994). Language support for object-oriented programming is discussed in detail in Chapter 12.
从某种意义上说,面向过程编程与面向数据编程相反。尽管面向数据的方法现在主导着软件开发,但面向过程的方法并没有被抛弃。相反,近年来,在面向过程编程方面出现了大量研究,特别是在并发领域。这些研究工作带来了对用于创建和控制并发程序单元的语言功能的需求。Java 和 C# 就具备这样的功能。第13章 将详细讨论并发。
Procedure-oriented programming is, in a sense, the opposite of data-oriented programming. Although data-oriented methods now dominate software development, procedure-oriented methods have not been abandoned. On the contrary, in recent years, a good deal of research has occurred in procedure-oriented programming, especially in the area of concurrency. These research efforts brought with them the need for language facilities for creating and controlling concurrent program units. Java and C# include such capabilities. Concurrency is discussed in detail in Chapter 13.
软件开发方法中的所有这些进化步骤都导致了支持它们的新语言结构的出现。
All of these evolutionary steps in software development methodologies led to new language constructs to support them.
编程语言通常分为四类:命令式、函数式、逻辑式和面向对象。但是,我们并不认为支持面向对象编程的语言是一类单独的语言。我们已经描述了支持面向对象编程的最流行的语言是如何从命令式语言中发展出来的。尽管面向对象的软件开发范式与通常用于命令式语言的面向过程范式有很大不同,但支持面向对象编程所需的命令式语言扩展并不多。例如,C 和 Java 的表达式、赋值语句和控制语句几乎相同。(另一方面,Java 的数组、子程序和语义与 C 的非常不同。)对于支持面向对象编程的函数式语言,也可以做出类似的陈述。
Programming languages are often categorized into four bins: imperative, functional, logic, and object oriented. However, we do not consider languages that support object-oriented programming to form a separate category of languages. We have described how the most popular languages that support object-oriented programming grew out of imperative languages. Although the object-oriented software development paradigm differs significantly from the procedure-oriented paradigm usually used with imperative languages, the extensions to an imperative language required to support object-oriented programming are not intensive. For example, the expressions, assignment statements, and control statements of C and Java are nearly identical. (On the other hand, the arrays, subprograms, and semantics of Java are very different from those of C.) Similar statements can be made for functional languages that support object-oriented programming.
有些作者将脚本语言视为一种单独的编程语言类别。然而,此类别中的语言更多地是通过其实现方法(部分或全部解释)而不是共同的语言设计联系在一起的。通常被称为脚本语言的语言,包括 Perl、JavaScript 和 Ruby(Flanagan 和 Matsumoto,2008 年),从任何意义上来说都是命令式语言。
Some authors refer to scripting languages as a separate category of programming languages. However, languages in this category are bound together more by their implementation method, partial or full interpretation, than by a common language design. The languages that are typically called scripting languages, among them Perl, JavaScript, and Ruby (Flanagan and Matsumoto, 2008), are imperative languages in every sense.
逻辑编程语言是基于规则的语言的一个例子。在命令式语言中,算法被详细指定,并且必须包含指令或语句的具体执行顺序。然而,在基于规则的语言中,规则没有特定的顺序,语言实现系统必须选择使用规则的顺序来产生所需的结果。这种软件开发方法与其他两类语言的方法截然不同,显然需要一种完全不同的语言。第 16 章讨论了最常用的逻辑编程语言 Prolog 和逻辑编程 。
A logic programming language is an example of a rule-based language. In an imperative language, an algorithm is specified in great detail, and the specific order of execution of the instructions or statements must be included. In a rule-based language, however, rules are specified in no particular order, and the language implementation system must choose an order in which the rules are used to produce the desired result. This approach to software development is radically different from those used with the other two categories of languages and clearly requires a completely different kind of language. Prolog, the most commonly used logic programming language, and logic programming are discussed in Chapter 16.
近年来,出现了一种新的语言类别,即标记/编程语言混合语言。标记语言不是编程语言。例如,最广泛使用的标记语言 HTML 用于指定 Web 文档中信息的布局。但是,一些编程功能已渗透到 HTML 和 XML 的一些扩展中。其中包括 Java 服务器页面标准标记库 (JSTL) 和可扩展样式表语言转换 (XSLT)。第2章简要介绍了这两者。这些语言无法与任何完整的编程语言相比,因此第 2章 之后将不再讨论。
In recent years, a new category of languages has emerged, the markup/programming hybrid languages. Markup languages are not programming languages. For instance, HTML, the most widely used markup language, is used to specify the layout of information in Web documents. However, some programming capability has crept into some extensions to HTML and XML. Among these are the Java Server Pages Standard Tag Library (JSTL) and eXtensible Stylesheet Language Transformations (XSLT). Both of these are briefly introduced in Chapter 2. Those languages cannot be compared to any of the complete programming languages and therefore will not be discussed after Chapter 2.
1.3节 中描述的编程语言评估标准为语言设计提供了一个框架。不幸的是,这个框架是自相矛盾的。Hoare (1973)在他关于语言设计的深刻论文中指出:“有这么多重要但相互冲突的标准,以至于它们的协调和满足是一项重大的工程任务。”
The programming language evaluation criteria described in Section 1.3 provide a framework for language design. Unfortunately, that framework is self-contradictory. In his insightful paper on language design, Hoare (1973) stated that “there are so many important but conflicting criteria, that their reconciliation and satisfaction is a major engineering task.”
两个相互冲突的标准是可靠性和执行成本。例如,Java 语言定义要求检查对数组元素的所有引用,以确保索引在其合法范围内。这一步骤大大增加了包含大量对数组元素的引用的 Java 程序的执行成本。C 不需要索引范围检查,因此 C 程序的执行速度比语义等效的 Java 程序更快,尽管 Java 程序更可靠。Java 的设计者以执行效率换取了可靠性。
Two criteria that conflict are reliability and cost of execution. For example, the Java language definition demands that all references to array elements be checked to ensure that the index or indices are in their legal ranges. This step adds a great deal to the cost of execution of Java programs that contain large numbers of references to array elements. C does not require index range checking, so C programs execute faster than semantically equivalent Java programs, although Java programs are more reliable. The designers of Java traded execution efficiency for reliability.
另一个直接导致设计权衡的冲突标准示例是 APL。APL 包含一组强大的数组操作符。由于运算符数量众多,APL 中必须包含大量新符号来表示运算符。此外,许多 APL 运算符可以在单个、长而复杂的表达式中使用。这种高度表达能力的一个结果是,对于涉及许多数组操作的应用程序,APL 非常易于编写。事实上,可以在非常小的程序中指定大量计算。另一个结果是 APL 程序的可读性很差。紧凑而简洁的表达式具有一定的数学美感,但除了程序员之外,任何人都很难理解。著名作家Daniel McCracken (1970)曾指出,他花了四个小时阅读和理解一个四行 APL 程序。APL 的设计者用可读性换取了可写性。
As another example of conflicting criteria that leads directly to design trade-offs, consider the case of APL. APL includes a powerful set of operators for array operands. Because of the large number of operators, a significant number of new symbols had to be included in APL to represent the operators. Also, many APL operators can be used in a single, long, complex expression. One result of this high degree of expressivity is that, for applications involving many array operations, APL is very writable. Indeed, a huge amount of computation can be specified in a very small program. Another result is that APL programs have very poor readability. A compact and concise expression has a certain mathematical beauty but it is difficult for anyone other than the programmer to understand. Well-known author Daniel McCracken (1970) once noted that it took him four hours to read and understand a four-line APL program. The designer of APL traded readability for writability.
可写性和可靠性之间的冲突是语言设计中常见的问题。C++ 的指针可以以多种方式操作,从而支持高度灵活的数据寻址。由于指针存在潜在的可靠性问题,因此它们未包含在 Java 中。
The conflict between writability and reliability is a common one in language design. The pointers of C++ can be manipulated in a variety of ways, which supports highly flexible addressing of data. Because of the potential reliability problems with pointers, they are not included in Java.
语言设计(和评估)标准之间冲突的例子比比皆是;有些冲突很微妙,有些冲突很明显。因此,在设计编程语言时,选择结构和特性的任务显然需要许多妥协和权衡。
Examples of conflicts among language design (and evaluation) criteria abound; some are subtle, others are obvious. It is therefore clear that the task of choosing constructs and features when designing a programming language requires many compromises and trade-offs.
如第1.4.1节 所述,计算机的两个主要组件是其内部存储器和处理器。内部存储器用于存储程序和数据。处理器是一组电路,可实现一组原始操作或机器指令,例如用于算术和逻辑运算的指令。在大多数计算机中,其中一些指令(有时称为宏指令)实际上是通过一组称为微指令的指令来实现的,微指令的定义级别更低。由于软件从未见过微指令,因此本文不再进一步讨论。
As described in Section 1.4.1, two of the primary components of a computer are its internal memory and its processor. The internal memory is used to store programs and data. The processor is a collection of circuits that provides a realization of a set of primitive operations, or machine instructions, such as those for arithmetic and logic operations. In most computers, some of these instructions, which are sometimes called macroinstructions, are actually implemented with a set of instructions called microinstructions, which are defined at an even lower level. Because microinstructions are never seen by software, they will not be discussed further here.
计算机的机器语言是其指令集。在没有其他支持软件的情况下,其自己的机器语言是大多数硬件计算机“理解”的唯一语言。从理论上讲,可以使用特定的高级语言作为计算机的机器语言来设计和构建计算机,但这将非常复杂且昂贵。此外,它将非常不灵活,因为很难(但并非不可能)将其与其他高级语言一起使用。更实用的机器设计选择在硬件中实现一种非常低级的语言,该语言提供最常用的原始操作,并要求系统软件创建与高级语言程序的接口。
The machine language of the computer is its set of instructions. In the absence of other supporting software, its own machine language is the only language that most hardware computers “understand.” Theoretically, a computer could be designed and built with a particular high-level language as its machine language, but it would be very complex and expensive. Furthermore, it would be highly inflexible, because it would be difficult (but not impossible) to use it with other high-level languages. The more practical machine design choice implements in hardware a very low-level language that provides the most commonly needed primitive operations and requires system software to create an interface to programs in higher-level languages.
语言实现系统不能是计算机上唯一的软件。还需要大量的程序,称为操作系统,它提供比机器语言更高级的原语。这些原语提供系统资源管理、输入和输出操作、文件管理系统、文本和/或程序编辑器以及各种其他常用功能。由于语言实现系统需要许多操作系统功能,因此它们与操作系统交互,而不是直接与处理器交互(以机器语言方式)。
A language implementation system cannot be the only software on a computer. Also required is a large collection of programs, called the operating system, which supplies higher-level primitives than those of the machine language. These primitives provide system resource management, input and output operations, a file management system, text and/or program editors, and a variety of other commonly needed functions. Because language implementation systems need many of the operating system facilities, they interface with the operating system rather than directly with the processor (in machine language).
操作系统和语言实现位于计算机的机器语言接口之上。这些层可以被认为是虚拟计算机,为更高级别的用户提供接口。例如,操作系统和 C 编译器提供虚拟 C 计算机。使用其他编译器,机器可以变成其他类型的虚拟计算机。大多数计算机系统提供几种不同的虚拟计算机。用户程序在虚拟计算机层的顶层形成另一层。计算机的分层视图如图1.2 所示。
The operating system and language implementations are layered over the machine language interface of a computer. These layers can be thought of as virtual computers, providing interfaces to the user at higher levels. For example, an operating system and a C compiler provide a virtual C computer. With other compilers, a machine can become other kinds of virtual computers. Most computer systems provide several different virtual computers. User programs form another layer over the top of the layer of virtual computers. The layered view of a computer is shown in Figure 1.2.
第一个高级编程语言的实现系统是在 20 世纪 50 年代末构建的,是当时最复杂的软件系统之一。在 20 世纪 60 年代,人们进行了深入的研究,以了解和形式化构建这些高级语言实现的过程。这些努力的最大成功是在语法分析领域,主要是因为实现过程的这一部分是当时人们很好理解的自动机理论和形式语言理论的一部分的应用。
The implementation systems of the first high-level programming languages, constructed in the late 1950s, were among the most complex software systems of that time. In the 1960s, intensive research efforts were made to understand and formalize the process of constructing these high-level language implementations. The greatest success of those efforts was in the area of syntax analysis, primarily because that part of the implementation process is an application of parts of automata theory and formal language theory that were then well understood.
编程语言可以通过三种通用方法中的任何一种来实现。一种极端方法是将程序翻译成机器语言,然后直接在计算机上执行。这种方法称为编译器实现,其优点是一旦翻译过程完成,程序执行速度非常快。大多数语言(如 C、COBOL 和 C++)的生产实现都是通过编译器实现的。
Programming languages can be implemented by any of three general methods. At one extreme, programs can be translated into machine language, which can be executed directly on the computer. This method is called a compiler implementation and has the advantage of very fast program execution, once the translation process is complete. Most production implementations of languages, such as C, COBOL, and C++, are by compilers.
编译器翻译的语言称为源语言。编译和程序执行的过程分为几个阶段,其中最重要的阶段如图1.3 所示。
The language that a compiler translates is called the source language. The process of compilation and program execution takes place in several phases, the most important of which are shown in Figure 1.3.
词法分析器将源程序的字符集中到词法单元中。程序的词法单元包括标识符、特殊词、运算符和标点符号。词法分析器会忽略源程序中的注释,因为编译器不需要它们。
The lexical analyzer gathers the characters of the source program into lexical units. The lexical units of a program are identifiers, special words, operators, and punctuation symbols. The lexical analyzer ignores comments in the source program because the compiler has no use for them.
语法分析器从词法分析器中获取词汇单元,并使用它们构建层次结构,称为解析树。这些解析树代表程序的语法结构。在许多情况下,并没有构建实际的解析树结构;相反,构建解析树所需的信息构建树是直接生成并使用的。词汇单元和解析树将在第 3章中进一步讨论。词汇分析和语法分析或解析将在第 4章 中讨论。
The syntax analyzer takes the lexical units from the lexical analyzer and uses them to construct hierarchical structures called parse trees. These parse trees represent the syntactic structure of the program. In many cases, no actual parse tree structure is constructed; rather, the information that would be required to build a tree is generated and used directly. Both lexical units and parse trees are discussed further in Chapter 3. Lexical analysis and syntax analysis, or parsing, are discussed in Chapter 4.
中间代码生成器生成另一种语言的程序,该程序处于源程序和编译器的最终输出(机器语言程序)之间的中间级别。4中间语言有时看起来很像汇编语言,事实上,有时是真正的汇编语言。在其他情况下,中间代码的级别略高于汇编语言。语义分析器是中间代码生成器的一个组成部分。语义分析器检查在语法分析期间很难(甚至不可能)检测到的错误(例如类型错误)。
The intermediate code generator produces a program in a different language, at an intermediate level between the source program and the final output of the compiler: the machine language program.4 Intermediate languages sometimes look very much like assembly languages, and in fact, sometimes are actual assembly languages. In other cases, the intermediate code is at a level somewhat higher than an assembly language. The semantic analyzer is an integral part of the intermediate code generator. The semantic analyzer checks for errors, such as type errors, that are difficult, if not impossible, to detect during syntax analysis.
优化,通过使程序更小或更快(或两者兼而有之)来改进程序(通常是中间代码版本)。由于许多优化很难在机器语言上进行,因此大多数优化都是在中间代码上进行的。
Optimization, which improves programs (usually in their intermediate code version) by making them smaller or faster or both. Because many kinds of optimization are difficult to do on machine language, most optimization is done on the intermediate code.
代码生成器将程序的优化中间代码版本翻译成等效的机器语言程序。
The code generator translates the optimized intermediate code version of the program into an equivalent machine language program.
符号表是编译过程的数据库。符号表的主要内容是程序中每个用户定义名称的类型和属性信息。这些信息由词法和语法分析器放在符号表中,并由语义分析器和代码生成器使用。
The symbol table serves as a database for the compilation process. The primary contents of the symbol table are the type and attribute information of each user-defined name in the program. This information is placed in the symbol table by the lexical and syntax analyzers and is used by the semantic analyzer and the code generator.
如前所述,虽然编译器生成的机器语言可以直接在硬件上执行,但它几乎总是必须与其他代码一起运行。大多数用户程序还需要操作系统中的程序。其中最常见的是输入和输出程序。当用户程序需要所需的系统程序时,编译器会构建对它们进行的调用。在执行编译器生成的机器语言程序之前,必须先找到操作系统中所需的程序并将其链接到用户程序。链接操作通过将系统程序入口点的地址放在用户程序中对它们的调用中,将用户程序连接到系统程序。用户代码和系统代码有时合称为加载模块或可执行映像。收集系统程序并将其链接到用户程序的过程称为链接和加载,有时简称为链接。它由称为链接器的系统程序完成。
As stated previously, although the machine language generated by a compiler can be executed directly on the hardware, it must nearly always be run along with some other code. Most user programs also require programs from the operating system. Among the most common of these are programs for input and output. The compiler builds calls to required system programs when they are needed by the user program. Before the machine language programs produced by a compiler can be executed, the required programs from the operating system must be found and linked to the user program. The linking operation connects the user program to the system programs by placing the addresses of the entry points of the system programs in the calls to them in the user program. The user and system code together are sometimes called a load module, or executable image. The process of collecting system programs and linking them to user programs is called linking and loading, or sometimes just linking. It is accomplished by a systems program called a linker.
除了系统程序之外,用户程序通常必须链接到库中先前编译的程序。因此,链接器不仅将给定程序链接到系统程序,还可以将其链接到其他用户或系统提供的程序。
In addition to systems programs, user programs must often be linked to previously compiled programs that reside in libraries. So the linker not only links a given program to system programs, but also it may link it to other user or system-supplied programs.
计算机内存和处理器之间的连接速度通常决定了计算机的速度,因为指令的执行速度通常比移动到处理器执行的速度更快。这种连接被称为冯·诺依曼瓶颈,它是冯·诺依曼结构计算机速度的主要限制因素。冯·诺依曼瓶颈一直是并行计算机研究和开发的主要动机之一。
The speed of the connection between a computer’s memory and its processor often determines the speed of the computer, because instructions often can be executed faster than they can be moved to the processor for execution. This connection is called the von Neumann bottleneck; it is the primary limiting factor in the speed of von Neumann architecture computers. The von Neumann bottleneck has been one of the primary motivations for the research and development of parallel computers.
在所有实现方法中,纯解释与编译截然相反。在这种方法中,程序由另一个称为解释器的程序解释,不进行任何翻译。解释器程序充当机器的软件模拟,其提取-执行周期处理高级语言程序语句而不是机器指令。这种软件模拟显然为该语言提供了一个虚拟机。
Pure interpretation lies at the opposite end (from compilation) among implementation methods. With this approach, programs are interpreted by another program called an interpreter, with no translation whatever. The interpreter program acts as a software simulation of a machine whose fetch-execute cycle deals with high-level language program statements rather than machine instructions. This software simulation obviously provides a virtual machine for the language.
纯解释的优点是可以轻松实现许多源代码级调试操作,因为所有运行时错误消息都可以引用源代码级单元。例如,如果发现数组索引超出范围,则错误消息可以轻松指出错误的源代码行和数组的名称。另一方面,这种方法有一个严重的缺点,即执行速度比编译系统慢 10 到 100 倍。这种缓慢的主要原因是高级语言语句的解码,这些语句比机器语言指令复杂得多(尽管语句的数量可能比等效机器代码中的指令少)。此外,无论语句执行多少次,每次都必须对其进行解码。因此,语句解码,而不是处理器和内存之间的连接,是纯解释器的瓶颈。
Pure interpretation has the advantage of allowing easy implementation of many source-level debugging operations, because all run-time error messages can refer to source-level units. For example, if an array index is found to be out of range, the error message can easily indicate the source line of the error and the name of the array. On the other hand, this method has the serious disadvantage that execution is 10 to 100 times slower than in compiled systems. The primary source of this slowness is the decoding of the high-level language statements, which are far more complex than machine language instructions (although there may be fewer statements than instructions in equivalent machine code). Furthermore, regardless of how many times a statement is executed, it must be decoded every time. Therefore, statement decoding, rather than the connection between the processor and memory, is the bottleneck of a pure interpreter.
纯解释的另一个缺点是它通常需要更多空间。除了源程序之外,符号表在解释期间也必须存在。此外,源程序可能以易于访问和修改的形式存储,而不是以提供最小尺寸的形式存储。
Another disadvantage of pure interpretation is that it often requires more space. In addition to the source program, the symbol table must be present during interpretation. Furthermore, the source program may be stored in a form designed for easy access and modification rather than one that provides for minimal size.
尽管 20 世纪 60 年代的一些简单的早期语言(APL、SNOBOL(Griswold 等,1971)和 Lisp)是纯解释型的,但到了 20 世纪 80 年代,这种方法很少用于高级语言。然而,近年来,纯解释型语言在一些 Web 脚本语言(如 JavaScript 和 PHP)中卷土重来,如今这些语言已得到广泛应用。纯解释型语言的流程如图1.4 所示。
Although some simple early languages of the 1960s (APL, SNOBOL (Griswold et al., 1971), and Lisp) were purely interpreted, by the 1980s, the approach was rarely used on high-level languages. However, in recent years, pure interpretation has made a significant comeback with some Web scripting languages, such as JavaScript and PHP, which are now widely used. The process of pure interpretation is shown in Figure 1.4.
有些语言实现系统是编译器和纯解释器之间的折衷;它们将高级语言程序翻译成易于解释的中间语言。这种方法比纯解释更快,因为源语言语句只需解码一次。这种实现称为混合实现系统。
Some language implementation systems are a compromise between compilers and pure interpreters; they translate high-level language programs to an intermediate language designed to allow easy interpretation. This method is faster than pure interpretation because the source language statements are decoded only once. Such implementations are called hybrid implementation systems.
混合实现系统所采用的流程如图 1.5所示。它不将中间语言代码翻译成机器代码,而是简单地解释中间代码。
The process used in a hybrid implementation system is shown in Figure 1.5. Instead of translating intermediate language code to machine code, it simply interprets the intermediate code.
Perl 采用混合系统实现,Perl 程序经过部分编译,在解释之前检测错误,并简化解释器。
Perl is implemented with a hybrid system. Perl programs are partially compiled to detect errors before interpretation and to simplify the interpreter.
Java 的最初实现都是混合的。它的中间形式称为字节码,它为任何具有字节码解释器和相关运行时系统的机器提供了可移植性。这些统称为 Java 虚拟机。现在有系统将 Java 字节码转换为机器码,以便更快地执行。
Initial implementations of Java were all hybrid. Its intermediate form, called byte code, provides portability to any machine that has a byte code interpreter and an associated run-time system. Together, these are called the Java Virtual Machine. There are now systems that translate Java byte code into machine code for faster execution.
即时 (JIT) 实现系统首先将程序翻译成中间语言。然后在执行期间,当中间语言方法被调用时,它会将其编译为机器代码。机器代码版本将保留以供后续调用。JIT 系统现在广泛用于 Java 程序。此外,.NET 语言都使用 JIT 系统实现。
A Just-in-Time (JIT) implementation system initially translates programs to an intermediate language. Then, during execution, it compiles intermediate language methods into machine code when they are called. The machine code version is kept for subsequent calls. JIT systems now are widely used for Java programs. Also, the .NET languages are all implemented with a JIT system.
有时,实现者可能会为一种语言提供编译和解释两种实现。在这些情况下,解释器用于开发和调试程序。然后,在达到(相对)无错误状态后,对程序进行编译以提高其执行速度。
Sometimes an implementor may provide both compiled and interpreted implementations for a language. In these cases, the interpreter is used to develop and debug programs. Then, after a (relatively) bug-free state is reached, the programs are compiled to increase their execution speed.
预处理器是一种在程序编译之前处理程序的程序。预处理器指令嵌入在程序中。预处理器本质上是一个宏扩展器。预处理器指令通常用于指定要包含来自另一个文件的代码。例如,C 预处理器指令
A preprocessor is a program that processes a program just before the program is compiled. Preprocessor instructions are embedded in programs. The preprocessor is essentially a macro expander. Preprocessor instructions are commonly used to specify that the code from another file is to be included. For example, the C preprocessor instruction
#include "myLib.h"#include "myLib.h"
导致预处理器将 的内容复制myLib.h到程序中的 位置#include。
causes the preprocessor to copy the contents of myLib.h into the program at the position of the #include.
其他预处理指令用于定义表示表达式的符号。例如,可以使用
Other preprocessor instructions are used to define symbols to represent expressions. For example, one could use
#define max(A, B) ((A) > (B) ? (A) : (B))#define max(A, B) ((A) > (B) ? (A) : (B))
确定两个给定表达式中的最大者。例如,表达式
to determine the largest of two given expressions. For example, the expression
x = max(2 * y, z / 1.73);x = max(2 * y, z / 1.73);
将被预处理器扩展为
would be expanded by the preprocessor to
x = ((2 * y) > (z / 1.73) ? (2 * y) : (z / 1.73);x = ((2 * y) > (z / 1.73) ? (2 * y) : (z / 1.73);
请注意,这是表达式副作用可能导致麻烦的情况之一。例如,如果提供给宏的任一表达式max具有副作用(例如z++),则可能会导致问题。由于两个表达式参数中的一个被求值两次,因此这可能导致z宏扩展生成的代码将其递增两次。
Notice that this is one of those cases where expression side effects can cause trouble. For example, if either of the expressions given to the max macro have side effects—such as z++—it could cause a problem. Because one of the two expression parameters is evaluated twice, this could result in z being incremented twice by the code produced by the macro expansion.
编程环境是软件开发中使用的工具的集合。该集合可能仅包含文件系统、文本编辑器、链接器和编译器。或者它可能包含大量集成工具,每个工具都通过统一的用户界面访问。在后一种情况下,软件的开发和维护过程得到了极大增强。因此,编程语言的特性并不是衡量系统软件开发能力的唯一标准。现在我们简要介绍几种编程环境。
A programming environment is the collection of tools used in the development of software. This collection may consist of only a file system, a text editor, a linker, and a compiler. Or it may include a large collection of integrated tools, each accessed through a uniform user interface. In the latter case, the process of the development and maintenance of software is greatly enhanced. Therefore, the characteristics of a programming language are not the only measure of the software development capability of a system. We now briefly describe several programming environments.
UNIX 是一个较老的编程环境,最早发布于 20 世纪 70 年代中期,基于可移植的多道程序设计操作系统构建。它提供了一系列强大的支持工具,用于以多种语言进行软件生产和维护。过去,UNIX 缺少的最重要的功能是其工具之间没有统一的界面。这使得它更难学习和使用。但是,现在 UNIX 通常通过在 UNIX 上运行的 GUI 来使用。UNIX GUI 的示例包括 Solaris 通用桌面环境 (CDE)、GNOME 和 KDE。这些 GUI 使 UNIX 的界面看起来类似于 Windows 和 Macintosh 系统的界面。
UNIX is an older programming environment, first distributed in the middle 1970s, built around a portable multiprogramming operating system. It provides a wide array of powerful support tools for software production and maintenance in a variety of languages. In the past, the most important feature absent from UNIX was a uniform interface among its tools. This made it more difficult to learn and to use. However, UNIX is now often used through a GUI that runs on top of UNIX. Examples of UNIX GUIs are the Solaris Common Desktop Environment (CDE), GNOME, and KDE. These GUIs make the interface to UNIX appear similar to that of Windows and Macintosh systems.
Borland JBuilder 是一个编程环境,它为 Java 开发提供了集成的编译器、编辑器、调试器和文件系统,所有这四个都可以通过图形界面访问。JBuilder 是一个复杂而强大的 Java 软件创建系统。
Borland JBuilder is a programming environment that provides an integrated compiler, editor, debugger, and file system for Java development, where all four are accessed through a graphical interface. JBuilder is a complex and powerful system for creating Java software.
Microsoft Visual Studio .NET 是软件开发环境发展过程中相对较新的一步。它是一个庞大而精致的软件开发工具集合,所有工具都通过窗口界面使用。该系统可用于使用五种 .NET 语言中的任何一种来开发软件:C#、Visual Basic.NET、JScript(Microsoft 版本的 JavaScript)、F#(一种函数式语言)和 C++/CLI。
Microsoft Visual Studio .NET is a relatively recent step in the evolution of software development environments. It is a large and elaborate collection of software development tools, all used through a windowed interface. This system can be used to develop software in any one of the five .NET languages: C#, Visual Basic.NET, JScript (Microsoft’s version of JavaScript), F# (a functional language), and C++/CLI.
NetBeans 是一个主要用于 Java 应用程序开发的开发环境,但也支持 JavaScript、Ruby 和 PHP。Visual Studio 和 NetBeans 不仅仅是开发环境,它们也是框架,这意味着它们实际上提供了应用程序代码的通用部分。
NetBeans is a development environment that is primarily used for Java application development but also supports JavaScript, Ruby, and PHP. Both Visual Studio and NetBeans are more than development environments—they are also frameworks, which means they actually provide common parts of the code of the application.
学习编程语言很有价值,原因如下:它提高了我们在编写程序时使用不同结构的能力,使我们能够更明智地为项目选择语言,并使学习新语言变得更容易。
The study of programming languages is valuable for some important reasons: It increases our capacity to use different constructs in writing programs, enables us to choose languages for projects more intelligently, and makes learning new languages easier.
计算机广泛应用于各种问题解决领域。特定编程语言的设计和评估高度依赖于其应用领域。
Computers are used in a wide variety of problem-solving domains. The design and evaluation of a particular programming language is highly dependent on the domain in which it is to be used.
评估语言的最重要标准包括可读性、可写性、可靠性和总体成本。这些将是我们在本书其余部分讨论的各种语言特性的审查和判断的基础。
Among the most important criteria for evaluating languages are readability, writability, reliability, and overall cost. These will be the basis on which we examine and judge the various language features discussed in the remainder of the book.
对语言设计的主要影响是机器架构和软件设计方法。
The major influences on language design have been machine architecture and software design methodologies.
设计一种编程语言主要是一项工程壮举,其中必须在特性、结构和能力之间做出一系列的权衡。
Designing a programming language is primarily an engineering feat, in which a long list of trade-offs must be made among features, constructs, and capabilities.
编程语言的实现方法主要有编译型、纯解释型和混合实现。
The major methods of implementing programming languages are compilation, pure interpretation, and hybrid implementation.
编程环境已经成为软件开发系统的重要组成部分,语言只是其中一个组成部分。
Programming environments have become important parts of software development systems, in which the language is just one of the components.
为什么对于程序员来说,拥有一些语言设计背景是有用的,即使他或她可能永远不会真正设计出一种编程语言?
Why is it useful for a programmer to have some background in language design, even though he or she may never actually design a programming language?
编程语言特性的知识如何使整个计算社区受益?
How can knowledge of programming language characteristics benefit the whole computing community?
过去 60 年里,哪种编程语言主导了科学计算?
What programming language has dominated scientific computing over the past 60 years?
过去 60 年里,哪种编程语言一直主导着商业应用?
What programming language has dominated business applications over the past 60 years?
过去 60 年里,哪种编程语言主宰了人工智能?
What programming language has dominated artificial intelligence over the past 60 years?
UNIX 大部分是用什么语言编写的?
In what language is most of UNIX written?
一种语言的特性太多会有什么缺点?
What is the disadvantage of having too many features in a language?
用户定义的运算符重载如何损害程序的可读性?
How can user-defined operator overloading harm the readability of a program?
举一个 C 设计中缺乏正交性的例子是什么?
What is one example of a lack of orthogonality in the design of C?
哪种语言使用正交性作为主要设计标准?
What language used orthogonality as a primary design criterion?
在缺乏更复杂控制语句的语言中,使用什么原始控制语句来构建更复杂的控制语句?
What primitive control statement is used to build more complicated control statements in languages that lack them?
程序的可靠性意味着什么?
What does it mean for a program to be reliable?
为什么对子程序的参数进行类型检查很重要?
Why is type checking the parameters of a subprogram important?
什么是混叠?
What is aliasing?
什么是异常处理?
What is exception handling?
为什么可读性对可写性很重要?
Why is readability important to writability?
给定语言的编译器成本与该语言的设计有何关系?
How is the cost of compilers for a given language related to the design of that language?
过去 60 年里,什么对编程语言设计影响最大?
What have been the strongest influences on programming language design over the past 60 years?
结构由冯·诺依曼计算机体系结构决定的编程语言类别的名称是什么?
What is the name of the category of programming languages whose structure is dictated by the von Neumann computer architecture?
20 世纪 70 年代的软件开发研究结果发现了哪两种编程语言的缺陷?
What two programming language deficiencies were discovered as a result of the research in software development in the 1970s?
面向对象编程语言的三个基本特征是什么?
What are the three fundamental features of an object-oriented programming language?
哪种语言最先支持面向对象编程的三个基本特性?
What language was the first to support the three fundamental features of object-oriented programming?
举个例子,说明两种语言设计标准是否直接冲突?
What is an example of two language design criteria that are in direct conflict with each other?
实现编程语言的三种一般方法是什么?
What are the three general methods of implementing a programming language?
编译器和纯解释器哪个能使程序执行得更快?
Which produces faster program execution, a compiler or a pure interpreter?
符号表在编译器中起什么作用?
What role does the symbol table play in a compiler?
链接器起什么作用?
What does a linker do?
为什么冯诺依曼瓶颈很重要?
Why is the von Neumann bottleneck important?
用纯解释器实现语言有什么优点?
What are the advantages in implementing a language with a pure interpreter?
您是否相信我们的抽象思维能力受语言能力的影响?支持您的观点。
Do you believe our capacity for abstract thought is influenced by our language skills? Support your opinion.
您所知道的特定编程语言的哪些特性,其基本原理对您来说是个谜?
What are some features of specific programming languages you know whose rationales are a mystery to you?
对于所有编程领域都使用单一语言这一想法,您能提出什么论据?
What arguments can you make for the idea of a single language for all programming domains?
您能提出哪些论据来反对使用单一语言来涵盖所有编程领域的想法?
What arguments can you make against the idea of a single language for all programming domains?
说出并解释评判语言的另一个标准(除本章讨论的标准之外)。
Name and explain another criterion by which languages can be judged (in addition to those discussed in this chapter).
您认为哪些常见的编程语言语句最有损于可读性?
What common programming language statement, in your opinion, is most detrimental to readability?
Java 使用右括号来标记所有复合语句的结束。支持和反对这种设计的论据是什么?
Java uses a right brace to mark the end of all compound statements. What are the arguments for and against this design?
许多语言会区分用户定义名称中的大写和小写字母。这种设计决策的优缺点是什么?
Many languages distinguish between uppercase and lowercase letters in user-defined names. What are the pros and cons of this design decision?
解释编程语言成本的不同方面。
Explain the different aspects of the cost of a programming language.
即使硬件相对便宜,编写高效程序的理由是什么?
What are the arguments for writing efficient programs even though hardware is relatively inexpensive?
用你知道的一些语言描述效率和安全性之间的一些设计权衡。
Describe some design trade-offs between efficiency and safety in some language you know.
您认为完美的编程语言应具备哪些主要特性?
In your opinion, what major features would a perfect programming language include?
你学习的第一门高级编程语言是用纯解释器、混合实现系统还是编译器实现的?(你可能需要研究一下这个问题。)
Was the first high-level programming language you learned implemented with a pure interpreter, a hybrid implementation system, or a compiler? (You may have to research this.)
描述一下你曾经使用过的一些编程环境的优点和缺点。
Describe the advantages and disadvantages of some programming environment you have used.
考虑到某些语言不需要简单变量的类型声明语句,它们如何影响语言的可读性?
How do type declaration statements for simple variables affect the readability of a language, considering that some languages do not require them?
使用本章中描述的标准,对你所了解的某种编程语言进行评估。
Write an evaluation of some programming language you know, using the criteria described in this chapter.
一些编程语言(例如 Pascal)使用分号来分隔语句,而 Java 则使用它来终止语句。您认为以下哪一种是最自然且最不可能导致语法错误的?支持您的答案。
Some programming languages—for example, Pascal—have used the semicolon to separate statements, while Java uses it to terminate statements. Which of these, in your opinion, is most natural and least likely to result in syntax errors? Support your answer.
许多现代语言允许两种注释:一种在两端使用分隔符(多行注释),另一种分隔符仅标记注释的开头(单行注释)。根据我们的标准讨论每种注释的优缺点。
Many contemporary languages allow two kinds of comments: one in which delimiters are used on both ends (multiple-line comments), and one in which a delimiter marks only the beginning of the comment (one-line comments). Discuss the advantages and disadvantages of each of these with respect to our criteria.
本章描述了一系列编程语言的发展。它探讨了每种语言的设计环境,并重点介绍了语言的贡献及其发展的动机。不包括整体语言描述;相反,我们只讨论每种语言引入的一些新功能。特别感兴趣的是对后续语言或计算机科学领域影响最大的功能。
This chapter describes the development of a collection of programming languages. It explores the environment in which each was designed and focuses on the contributions of the language and the motivation for its development. Overall language descriptions are not included; rather, we discuss only some of the new features introduced by each language. Of particular interest are the features that most influenced subsequent languages or the field of computer science.
本章不深入讨论任何语言特性或概念;这些留到后面的章节再讲。简要、非正式的特性解释足以让我们了解这些语言的发展历程。
This chapter does not include an in-depth discussion of any language feature or concept; that is left for later chapters. Brief, informal explanations of features will suffice for our trek through the development of these languages.
我们讨论了许多读者不熟悉的语言和语言概念。这些主题只在后面的章节中详细描述。那些对此感到不安的人可能更愿意推迟阅读本章,直到学习完本书的其余部分。
We discuss a wide variety of languages and language concepts that will not be familiar to many readers. These topics are described in detail only in later chapters. Those who find this unsettling may prefer to delay reading this chapter until the rest of the book has been studied.
选择在这里讨论的语言是主观的,有些读者会不高兴地发现他们最喜欢的一种或几种语言没有被提及。但是,为了将历史内容保持在合理的范围内,有必要省略一些人们高度重视的语言。这些选择基于我们对每种语言对语言发展和整个计算世界的重要性的估计。我们还简要讨论了本书后面提到的一些其他语言。
The choice as to which languages to discuss here was subjective, and some readers will unhappily note the absence of one or more of their favorites. However, to keep this historical coverage to a reasonable size, it was necessary to leave out some languages that some regard highly. The choices were based on our estimate of each language’s importance to language development and the computing world as a whole. We also include brief discussions of some other languages that are referenced later in the book.
本章的组织结构如下:语言的初始版本通常按时间顺序讨论。但是,语言的后续版本与其初始版本一起出现,而不是在后面的章节中出现。例如,Fortran 2003 在 Fortran I (1956) 的部分中讨论。此外,在某些情况下,与有自己章节的语言相关的次要语言也会出现在该部分中。
The organization of this chapter is as follows: The initial versions of languages generally are discussed in chronological order. However, subsequent versions of languages appear with their initial version, rather than in later sections. For example, Fortran 2003 is discussed in the section with Fortran I (1956). Also, in some cases, languages of secondary importance that are related to a language that has its own section appear in that section.
本章列出了 14 个完整的示例程序,每个程序都使用不同的语言编写。本章不介绍这些程序;它们旨在说明使用这些语言编写的程序的外观。熟悉任何常见命令式语言的读者都应该能够阅读和理解这些程序中的大部分代码,Lisp、COBOL 和 Smalltalk 中的代码除外。(第15章 讨论了类似于 Lisp 示例的 Scheme 函数。)Fortran、ALGOL 60、PL/I、Basic、Pascal、C、Perl、Ada、Java、JavaScript 和 C# 程序解决了同样的问题。请注意,此列表中的大多数当代语言都支持动态数组,但由于示例问题简单,我们没有在示例程序中使用它们。此外,在 Fortran 95 程序中,我们避免使用可以完全避免使用循环的功能,部分是为了保持程序简单易读,部分只是为了说明该语言的基本循环结构。
This chapter includes listings of 14 complete example programs, each in a different language. These programs are not described in this chapter; they are meant to illustrate the appearance of programs in these languages. Readers familiar with any of the common imperative languages should be able to read and understand most of the code in these programs, except those in Lisp, COBOL, and Smalltalk. (A Scheme function similar to the Lisp example is discussed in Chapter 15.) The same problem is solved by the Fortran, ALGOL 60, PL/I, Basic, Pascal, C, Perl, Ada, Java, JavaScript, and C# programs. Note that most of the contemporary languages in this list support dynamic arrays, but because of the simplicity of the example problem, we did not use them in the example programs. Also, in the Fortran 95 program, we avoided using the features that could have avoided the use of loops altogether, in part to keep the program simple and readable and in part just to illustrate the basic loop structure of the language.
图 2.1是本章讨论的高级语言的谱系图。
Figure 2.1 is a chart of the genealogy of the high-level languages discussed in this chapter.
本章讨论的第一种编程语言在几个方面都非常不寻常。首先,它从未被实现过。此外,尽管它于 1945 年开发,但其描述直到 1972 年才发布。由于很少有人熟悉该语言,它的某些功能直到开发 15 年后才出现在其他语言中。
The first programming language discussed in this chapter is highly unusual in several respects. For one thing, it was never implemented. Furthermore, although developed in 1945, its description was not published until 1972. Because so few people were familiar with the language, some of its capabilities did not appear in other languages until 15 years after its development.
1936 年至 1945 年间,德国科学家康拉德·楚泽 (Konrad Zuse,发音为“Tsoo-zuh”) 用机电继电器制造了一系列复杂精密的计算机。到 1945 年初,盟军轰炸摧毁了他最新的型号 Z4 中除一台之外的所有计算机,因此他搬到了巴伐利亚州的一个偏远村庄欣特施泰因,他的研究小组成员也各奔东西。
Between 1936 and 1945, German scientist Konrad Zuse (pronounced “Tsoo-zuh”) built a series of complex and sophisticated computers from electromechanical relays. By early 1945, Allied bombing had destroyed all but one of his latest models, the Z4, so he moved to a remote Bavarian village, Hinterstein, and his research group members went their separate ways.
独自一人,祖泽开始努力开发一种用于表达 Z4 计算的语言,这是他 1943 年作为博士论文提案开始的一项项目。他将这种语言命名为 Plankalkül,意思是程序演算。在一份 1945 年但直到 1972 年才发表的长篇手稿中(祖泽,1972 年),祖泽定义了 Plankalkül,并用该语言编写了算法来解决各种各样的问题。
Working alone, Zuse embarked on an effort to develop a language for expressing computations for the Z4, a project he had begun in 1943 as a proposal for his Ph.D. dissertation. He named this language Plankalkül, which means program calculus. In a lengthy manuscript dated 1945 but not published until 1972 (Zuse, 1972), Zuse defined Plankalkül and wrote algorithms in the language to solve a wide variety of problems.
Plankalkül 非常完整,在数据结构方面具有一些最先进的功能。Plankalkül 中最简单的数据类型是单个位。整数和浮点数值类型都是从位类型构建的。浮点类型使用二进制补码表示法和当前使用的“隐藏位”方案,以避免存储浮点值规范化小数部分的最高有效位。
Plankalkül was remarkably complete, with some of its most advanced features in the area of data structures. The simplest data type in Plankalkül was the single bit. Integer and floating-point numeric types were built from the bit type. The floating-point type used twos-complement notation and the “hidden bit” scheme currently used to avoid storing the most significant bit of the normalized fraction part of a floating-point value.
除了常见的标量类型外,Plankalkül 还包含数组和记录(在基于 C 的语言中称为结构)。记录可以包括嵌套记录。
In addition to the usual scalar types, Plankalkül included arrays and records (called structs in the C-based languages). The records could include nested records.
尽管该语言没有明确的 goto,但它确实包含一个类似于 Ada 的迭代语句for。它还具有Fin带上标的命令,该命令指定退出给定数量的迭代循环嵌套或开始新的迭代循环。Plankalkül 包含一个选择语句,但不允许使用子句else。
Although the language had no explicit goto, it did include an iterative statement similar to the Ada for. It also had the command Fin with a superscript that specified an exit out of a given number of iteration loop nestings or to the beginning of a new iteration cycle. Plankalkül included a selection statement, but it did not allow an else clause.
Zuse 程序最有趣的特性之一是包含数学表达式,显示程序变量之间的当前关系。这些表达式说明了在代码中出现它们时执行时会发生什么。这些与 Java 的断言以及公理语义中的断言非常相似,这将在第3章 中讨论。
One of the most interesting features of Zuse’s programs was the inclusion of mathematical expressions showing the current relationships between program variables. These expressions stated what would be true during execution at the points in the code where they appeared. These are very similar to the assertions of Java and in those in axiomatic semantics, which is discussed in Chapter 3.
Zuse 的手稿中包含的程序比 1945 年之前编写的任何程序都要复杂得多。其中包括对数字数组进行排序的程序;测试给定图的连通性;执行整数和浮点运算(包括平方根);以及对具有六个不同优先级的括号和运算符的逻辑公式进行语法分析。也许最引人注目的是他 49 页的国际象棋算法,而他并不是这项游戏的专家。
Zuse’s manuscript contained programs of far greater complexity than any written prior to 1945. Included were programs to sort arrays of numbers; test the connectivity of a given graph; carry out integer and floating-point operations, including square root; and perform syntax analysis on logic formulas that had parentheses and operators in six different levels of precedence. Perhaps most remarkable were his 49 pages of algorithms for playing chess, a game in which he was not an expert.
如果计算机科学家在 20 世纪 50 年代早期发现了 Zuse 对 Plankalkül 的描述,那么阻碍该语言按定义实现的唯一因素就是符号。每个语句由两行或三行代码组成。第一行最像当前语言的语句。第二行是可选的,包含第一行中数组引用的下标。19 世纪中叶,查尔斯·巴贝奇 (Charles Babbage) 在他的分析机程序中使用了相同的指示下标的方法。每个 Plankalkül 语句的最后一行包含第一行中提到的变量的类型名称。初次看到这种符号时,您会感到十分害怕。
If a computer scientist had found Zuse’s description of Plankalkül in the early 1950s, the single aspect of the language that would have hindered its implementation as defined would have been the notation. Each statement consisted of either two or three lines of code. The first line was most like the statements of current languages. The second line, which was optional, contained the subscripts of the array references in the first line. The same method of indicating subscripts was used by Charles Babbage in programs for his Analytical Engine in the middle of the nineteenth century. The last line of each Plankalkül statement contained the type names for the variables mentioned in the first line. This notation is quite intimidating when first seen.
以下示例赋值语句将表达式的值赋给A[4} + 1,A[5]说明了此表示法。 标有 的行表示V下标,标有 的行S表示数据类型。在此示例中,表示n1.n位整数:
The following example assignment statement, which assigns the value of the expression A[4} + 1 to A[5], illustrates this notation. The row labeled V is for subscripts, and the row labeled S is for the data types. In this example, 1.n means an integer of n bits:
| A + 1 => A
V | 4 5
S | 1.n 1.n
| A + 1 => A
V | 4 5
S | 1.n 1.n
我们只能推测,如果楚泽的工作在 1945 年甚至 1950 年就已广为人知,编程语言设计可能会朝什么方向发展。同样有趣的是,如果他在一个有其他科学家包围的和平环境中完成这项工作,而不是在 1945 年在德国几乎与世隔绝的情况下完成工作,他的工作可能会有何不同。
We can only speculate on the direction that programming language design might have taken if Zuse’s work had been widely known in 1945 or even 1950. It is also interesting to consider how his work might have been different had he done it in a peaceful environment surrounded by other scientists, rather than in Germany in 1945 in virtual isolation.
首先,请注意,此处使用的伪代码一词与其当代含义不同。我们将本节讨论的语言称为伪代码,因为它们在开发和使用时(20 世纪 40 年代末和 50 年代初)就是这么命名的。然而,它们显然不是当代意义上的伪代码。
First, note that the word pseudocode is used here in a different sense than its contemporary meaning. We call the languages discussed in this section pseudocodes because that’s what they were named at the time they were developed and used (the late 1940s and early 1950s). However, they are clearly not pseudocodes in the contemporary sense.
20 世纪 40 年代末和 50 年代初出现的计算机远不如今天的计算机好用。当时的计算机不仅速度慢、可靠性低、价格昂贵、内存极小,而且由于缺乏支持软件,编程也十分困难。
The computers that became available in the late 1940s and early 1950s were far less usable than those of today. In addition to being slow, unreliable, expensive, and having extremely small memories, the machines of that time were difficult to program because of the lack of supporting software.
当时没有高级编程语言,甚至没有汇编语言,所以编程是用机器代码完成的,这既繁琐又容易出错。它的问题之一是使用数字代码来指定指令。例如,ADD 指令可能由代码 14 指定,而不是由一个有内涵的文本名称指定,即使只有一个字母。这使得程序很难阅读。更严重的问题是绝对寻址,这使得程序修改繁琐且容易出错。例如,假设我们有一个存储在内存中的机器语言程序。这种程序中的许多指令引用程序中的其他位置,通常用于引用数据或指示分支指令的目标。在程序中除末尾以外的任何位置插入指令都会使引用插入点以外地址的所有指令的正确性失效,因为必须增加这些地址才能为新指令腾出空间。为了正确进行加法,必须找到并修改引用加法后地址的所有指令。删除指令也会出现类似的问题。但在这种情况下,机器语言通常包含一个“无操作”指令,可以替换被删除的指令,从而避免该问题。
There were no high-level programming languages or even assembly languages, so programming was done in machine code, which is both tedious and error prone. Among its problems is the use of numeric codes for specifying instructions. For example, an ADD instruction might be specified by the code 14 rather than a connotative textual name, even if only a single letter. This makes programs very difficult to read. A more serious problem is absolute addressing, which makes program modification tedious and error prone. For example, suppose we have a machine language program stored in memory. Many of the instructions in such a program refer to other locations within the program, usually to reference data or to indicate the targets of branch instructions. Inserting an instruction at any position in the program other than at the end invalidates the correctness of all instructions that refer to addresses beyond the insertion point, because those addresses must be increased to make room for the new instruction. To make the addition correctly, all instructions that refer to addresses that follow the addition must be found and modified. A similar problem occurs with deletion of an instruction. In this case, however, machine languages often include a “no operation” instruction that can replace deleted instructions, thereby avoiding the problem.
这些是所有机器语言的标准问题,也是发明汇编程序和汇编语言的主要动机。此外,当时的大多数编程问题都是数值问题,需要浮点算术运算和某种索引,以便方便使用数组。然而,这两种功能都没有包含在 20 世纪 40 年代末和 50 年代初的计算机架构中。这些缺陷自然导致了更高级语言的发展。
These are standard problems with all machine languages and were the primary motivations for inventing assemblers and assembly languages. In addition, most programming problems of that time were numerical and required floating-point arithmetic operations and indexing of some sort to allow the convenient use of arrays. Neither of these capabilities, however, was included in the architecture of the computers of the late 1940s and early 1950s. These deficiencies naturally led to the development of somewhat higher-level languages.
第一种新语言名为短代码,由 John Mauchly 于 1949 年为 BINAC 计算机开发,BINAC 是最早成功的存储程序电子计算机之一。短代码后来被转移到 UNIVAC I 计算机(美国销售的第一台商用电子计算机),并且多年来一直是编程这些机器的主要手段之一。尽管人们对原始短代码知之甚少,因为其完整描述从未发表过,但 UNIVAC I 版本的编程手册确实留存了下来(Remington-Rand,1952 年)。可以肯定这两个版本非常相似。
The first of these new languages, named Short Code, was developed by John Mauchly in 1949 for the BINAC computer, which was one of the first successful stored-program electronic computers. Short Code was later transferred to a UNIVAC I computer (the first commercial electronic computer sold in the United States) and, for several years, was one of the primary means of programming those machines. Although little is known of the original Short Code because its complete description was never published, a programming manual for the UNIVAC I version did survive (Remington-Rand, 1952). It is safe to assume that the two versions were very similar.
UNIVAC I 内存的字有 72 位,分为 12 个六位字节。短代码由要评估的数学表达式的编码版本组成。代码是字节对值,许多方程式可以编码在一个字中。包括以下操作代码:
The words of the UNIVAC I’s memory had 72 bits, grouped as 12 six-bit bytes. Short Code consisted of coded versions of mathematical expressions that were to be evaluated. The codes were byte-pair values, and many equations could be coded in a word. The following operation codes were included:
01 - 06 abs value 1n (n+2)nd power
02 ) 07 + 2n (n+2)nd root
03 = 08 pause 4n if <= n
04 / 09 ( 58 print and tab
01 - 06 abs value 1n (n+2)nd power
02 ) 07 + 2n (n+2)nd root
03 = 08 pause 4n if <= n
04 / 09 ( 58 print and tab
变量以字节对代码命名,用作常量的位置也是如此。例如,X0和Y0可以是变量。语句
Variables were named with byte-pair codes, as were locations to be used as constants. For example, X0 and Y0 could be variables. The statement
X0 = SQRT(ABS(Y0))X0 = SQRT(ABS(Y0))
在单词中编码为00 X0 03 20 06 Y0。首字母00用作填充以填充单词。有趣的是,没有乘法代码;乘法是通过简单地将两个操作数放在一起来表示的,就像在代数中一样。
would be coded in a word as 00 X0 03 20 06 Y0. The initial 00 was used as padding to fill the word. Interestingly, there was no multiplication code; multiplication was indicated by simply placing the two operands next to each other, as in algebra.
短代码没有被翻译成机器码,而是用纯解释器实现的。当时这个过程被称为自动编程。它显然简化了编程过程,但却以执行时间为代价。短代码解释速度比机器码慢约 50 倍。
Short Code was not translated to machine code; rather, it was implemented with a pure interpreter. At the time, this process was called automatic programming. It clearly simplified the programming process, but at the expense of execution time. Short Code interpretation was approximately 50 times slower than machine code.
在其他地方,正在开发的解释系统将机器语言扩展为包含浮点运算。John Backus 为 IBM 701 开发的 Speedcoding 系统就是这样的系统的一个示例 ( Backus, 1954 )。Speedcoding 解释器有效地将 701 转换为虚拟的三地址浮点计算器。该系统包括对浮点数据的四种算术运算以及平方根、正弦、反正切、指数和对数等运算的伪指令。条件和无条件分支以及输入/输出转换也是虚拟架构的一部分。为了了解此类系统的局限性,请考虑加载解释器后剩余的可用内存只有 700 个字,并且加法指令需要 4.2 毫秒才能执行。另一方面,Speedcoding 包括自动递增地址寄存器的新颖功能。直到 1962 年的 UNIVAC 1107 计算机,这种功能才出现在硬件中。由于这些功能,矩阵乘法可以在 12 条 Speedcoding 指令中完成。Backus 声称,用机器代码编程需要两周时间才能解决的问题,使用 Speedcoding 只需几个小时即可解决。
In other places, interpretive systems were being developed that extended machine languages to include floating-point operations. The Speedcoding system developed by John Backus for the IBM 701 is an example of such a system (Backus, 1954). The Speedcoding interpreter effectively converted the 701 to a virtual three-address floating-point calculator. The system included pseudoinstructions for the four arithmetic operations on floating-point data, as well as operations such as square root, sine, arc tangent, exponent, and logarithm. Conditional and unconditional branches and input/output conversions were also part of the virtual architecture. To get an idea of the limitations of such systems, consider that the remaining usable memory after loading the interpreter was only 700 words and that the add instruction took 4.2 milliseconds to execute. On the other hand, Speedcoding included the novel facility of automatically incrementing address registers. This facility did not appear in hardware until the UNIVAC 1107 computers of 1962. Because of such features, matrix multiplication could be done in 12 Speedcoding instructions. Backus claimed that problems that could take two weeks to program in machine code could be programmed in a few hours using Speedcoding.
1951 年至 1953 年间,UNIVAC 的 Grace Hopper 领导的团队开发了一系列名为 A-0、A-1 和 A-2 的“编译”系统,这些系统将伪代码扩展为机器代码子程序,就像将宏扩展为汇编语言一样。这些“编译器”的伪代码源代码仍然非常原始,尽管这比机器代码有了很大的改进,因为它使源程序变得更短。Wilkes (1952)独立提出了类似的过程。
Between 1951 and 1953, a team led by Grace Hopper at UNIVAC developed a series of “compiling” systems named A-0, A-1, and A-2 that expanded a pseudocode into machine code subprograms in the same way as macros are expanded into assembly language. The pseudocode source for these “compilers” was still quite primitive, although even this was a great improvement over machine code because it made source programs much shorter. Wilkes (1952) independently suggested a similar process.
大约在同一时间,人们还开发了其他简化编程任务的方法。在剑桥大学,David J. Wheeler(1950 年)开发了一种使用可重定位地址块的方法,至少部分解决了绝对寻址问题,后来,Maurice V. Wilkes(也在剑桥大学)扩展了这个想法,设计了一种可以组合选定子程序并分配存储的汇编程序(Wilkes 等人,1951 年、1957 年)。这确实是一项重要而根本的进步。
Other means of easing the task of programming were being developed at about the same time. At Cambridge University, David J. Wheeler (1950) developed a method of using blocks of relocatable addresses to solve, at least partially, the problem of absolute addressing, and later, Maurice V. Wilkes (also at Cambridge) extended the idea to design an assembly program that could combine chosen subroutines and allocate storage (Wilkes et al., 1951, 1957). This was indeed an important and fundamental advance.
我们还应该提到,汇编语言与上述的伪代码有很大不同,它是在 20 世纪 50 年代早期发展起来的。然而,它们对高级语言的设计影响不大。
We should also mention that assembly languages, which are quite different from the pseudocodes discussed, evolved during the early 1950s. However, they had little impact on the design of high-level languages.
毫无疑问,计算领域最伟大的进步之一来自于 1954 年推出的 IBM 704,这在很大程度上是因为其功能推动了 Fortran 的发展。有人可能会说,如果不是 IBM 开发了 704 和 Fortran,那么不久之后就会有其他组织开发类似的计算机和相关的高级语言。然而,IBM 是第一个既有远见又有资源进行这些开发的公司。
Certainly one of the greatest single advances in computing came with the introduction of the IBM 704 in 1954, in large measure because its capabilities prompted the development of Fortran. One could argue that if it had not been IBM with the 704 and Fortran, it would soon thereafter have been some other organization with a similar computer and related high-level language. However, IBM was the first with both the foresight and the resources to undertake these developments.
从 20 世纪 40 年代末到 50 年代中期,人们之所以容忍解释系统速度缓慢,主要原因之一是当时的计算机缺乏浮点硬件。所有浮点运算都必须在软件中模拟,这是一个非常耗时的过程。由于软件浮点处理花费了如此多的处理器时间,解释的开销和索引的模拟相对来说微不足道。只要浮点必须由软件完成,解释就是可以接受的开销。然而,当时的许多程序员从未使用过解释系统,他们更喜欢手工编码的机器(或汇编)语言的效率。IBM 704 系统的发布,在硬件中同时具有索引和浮点指令,预示着解释时代的结束,至少对于科学计算而言是如此。浮点硬件的加入消除了解释成本的隐藏之处。
One of the primary reasons why the slowness of interpretive systems was tolerated from the late 1940s to the mid-1950s was the lack of floating-point hardware in the available computers. All floating-point operations had to be simulated in software, a very time-consuming process. Because so much processor time was spent in software floating-point processing, the overhead of interpretation and the simulation of indexing were relatively insignificant. As long as floating-point had to be done by software, interpretation was an acceptable expense. However, many programmers of that time never used interpretive systems, preferring the efficiency of hand-coded machine (or assembly) language. The announcement of the IBM 704 system, with both indexing and floating-point instructions in hardware, heralded the end of the interpretive era, at least for scientific computation. The inclusion of floating-point hardware removed the hiding place for the cost of interpretation.
尽管 Fortran 通常被认为是第一个编译型高级语言,但谁应该为第一个此类语言的实现而受到赞誉却一直没有定论。Knuth和 Pardo (1977)将曼彻斯特 Mark I 计算机的 Autocode 编译器的功劳归功于 Alick E. Glennie。Glennie 在英国皇家军备研究机构 Fort Halstead 开发了该编译器。该编译器于 1952 年 9 月投入使用。然而,根据 John Backus ( Wexelblat, 1981 , p. 26) 的说法,Glennie 的 Autocode 级别太低且面向机器,因此不应被视为编译系统。Backus 将功劳归功于麻省理工学院的 Laning 和 Zierler。
Although Fortran is often credited with being the first compiled high-level language, the question of who deserves credit for implementing the first such language is somewhat open. Knuth and Pardo (1977) give the credit to Alick E. Glennie for his Autocode compiler for the Manchester Mark I computer. Glennie developed the compiler at Fort Halstead, Royal Armaments Research Establishment, in England. The compiler was operational by September 1952. However, according to John Backus (Wexelblat, 1981, p. 26), Glennie’s Autocode was so low level and machine oriented that it should not be considered a compiled system. Backus gives the credit to Laning and Zierler at the Massachusetts Institute of Technology.
Laning 和 Zierler 系统(Laning and Zierler, 1954)是第一个实现的代数翻译系统。我们所说的代数是指它可以翻译算术表达式,使用单独编码的子程序来计算超越函数(例如正弦和对数),并包含数组。 1952 年夏天,该系统以实验原型形式在 MIT Whirlwind 计算机上实现,并于 1953 年 5 月以更易用的形式实现。翻译器生成一个子程序调用来编码程序中的每个公式或表达式。源语言易于阅读,唯一实际包含的机器指令是用于分支的指令。尽管这项工作早于 Fortran 的工作,但它从未离开过 MIT。
The Laning and Zierler system (Laning and Zierler, 1954) was the first algebraic translation system to be implemented. By algebraic, we mean that it translated arithmetic expressions, used separately coded subprograms to compute transcendental functions (e.g., sine and logarithm), and included arrays. The system was implemented on the MIT Whirlwind computer, in experimental prototype form, in the summer of 1952 and in a more usable form by May 1953. The translator generated a subroutine call to code each formula, or expression, in the program. The source language was easy to read, and the only actual machine instructions included were for branching. Although this work preceded the work on Fortran, it never escaped MIT.
尽管有这些早期作品,但第一个被广泛接受的编译高级语言是 Fortran。以下小节记述了这一重要发展。
In spite of these earlier works, the first widely accepted compiled high-level language was Fortran. The following subsections chronicle this important development.
早在 1954 年 5 月宣布 704 系统之前,Fortran 的计划就已开始。到 1954 年 11 月,IBM 的 John Backus 和他的团队已经撰写了一份题为“IBM 数学公式翻译系统:FORTRAN”的报告(IBM,1954 年)。该文档描述了 Fortran 的第一个版本,我们称之为 Fortran 0,该版本尚未实施。它还大胆地指出,Fortran 将提供手工编码程序的效率和解释性伪代码系统的编程简易性。在另一股乐观情绪中,该文档指出 Fortran 将消除编码错误和调试过程。基于这一前提,第一个 Fortran 编译器几乎不包含语法错误检查。
Even before the 704 system was announced in May 1954, plans were begun for Fortran. By November 1954, John Backus and his group at IBM had produced the report titled “The IBM Mathematical FORmula TRANslating System: FORTRAN” (IBM, 1954). This document described the first version of Fortran, which we refer to as Fortran 0, prior to its implementation. It also boldly stated that Fortran would provide the efficiency of hand-coded programs and the ease of programming of the interpretive pseudocode systems. In another burst of optimism, the document stated that Fortran would eliminate coding errors and the debugging process. Based on this premise, the first Fortran compiler included little syntax error checking.
Fortran 的开发环境如下:(1) 计算机内存小、速度慢且相对不可靠;(2) 计算机主要用于科学计算;(3) 当时没有有效的计算机编程方法;(4) 由于计算机成本高于程序员的成本,因此生成目标代码的速度是第一批 Fortran 编译器的主要目标。Fortran 早期版本的特点直接源于这种环境。
The environment in which Fortran was developed was as follows: (1) Computers had small memories and were slow and relatively unreliable; (2) the primary use of computers was for scientific computations; (3) there were no existing efficient and effective ways to program computers; and (4) because of the high cost of computers compared to the cost of programmers, speed of the generated object code was the primary goal of the first Fortran compilers. The characteristics of the early versions of Fortran follow directly from this environment.
Fortran 0 在实施期间进行了修改,实施期始于 1955 年 1 月,一直持续到 1957 年 4 月编译器发布。实施后的语言我们称之为 Fortran I,在 1956 年 10 月出版的第一本 Fortran程序员参考手册中有所描述(IBM,1956 年)。Fortran I 包括输入/输出格式、最多六个字符的变量名(Fortran 0 中只有两个字符)、用户定义的子例程,尽管它们无法单独编译If选择语句和Do循环语句。
Fortran 0 was modified during the implementation period, which began in January 1955 and continued until the release of the compiler in April 1957. The implemented language, which we call Fortran I, is described in the first Fortran Programmer’s Reference Manual, published in October 1956 (IBM, 1956). Fortran I included input/output formatting, variable names of up to six characters (it had been just two in Fortran 0), user-defined subroutines, although they could not be separately compiled, the If selection statement, and the Do loop statement.
Fortran I 的所有控制语句均基于 704 指令。尚不清楚 704 设计者是否指定了 Fortran I 的控制语句设计,还是 Fortran I 设计者向 704 设计者建议了这些指令。
All of Fortran I’s control statements were based on 704 instructions. It is not clear whether the 704 designers dictated the control statement design of Fortran I or whether the designers of Fortran I suggested these instructions to the 704 designers.
Fortran I 语言中没有数据类型声明。变量名以I、J、K、和开头的变量隐式为整数类型,其他所有变量隐式为浮点数。选择这些字母作为此约定是基于这样一个事实:当时科学家和工程师使用字母作为变量下标,通常是Li 、 j和k 。Fortran的设计者们慷慨地添加了三个额外的字母。MN
There were no data-typing statements in the Fortran I language. Variables whose names began with I, J, K, L, M, and N were implicitly integer type, and all others were implicitly floating-point. The choice of the letters for this convention was based on the fact that at that time scientists and engineers used letters as variable subscripts, usually i, j, and k. In a gesture of generosity, Fortran’s designers threw in the three additional letters.
Fortran 开发小组在设计该语言时最大胆的宣称是,编译器生成的机器代码效率只有手工生成的代码的一半。1这比其他任何事情都更让潜在用户心存疑虑,并在 Fortran 实际发布之前阻止了人们对其产生很大兴趣。然而,令几乎所有人都感到惊讶的是,Fortran 开发小组几乎实现了其效率目标。构建第一个编译器所花费的 18 个工作年中,大部分精力都花在了优化上,结果非常有效。
The most audacious claim made by the Fortran development group during the design of the language was that the machine code produced by the compiler would be about half as efficient as what could be produced by hand.1 This, more than anything else, made skeptics of potential users and prevented a great deal of interest in Fortran before its actual release. To almost everyone’s surprise, however, the Fortran development group nearly achieved its goal in efficiency. The largest part of the 18 worker-years of effort used to construct the first compiler had been spent on optimization, and the results were remarkably effective.
1958 年 4 月的一项调查结果显示了 Fortran 的早期成功。当时,大约一半为 704 编写的代码都是用 Fortran 编写的,尽管仅仅一年前大多数编程界还对此持怀疑态度。
The early success of Fortran is shown by the results of a survey made in April 1958. At that time, roughly half of the code being written for 704s was being written in Fortran, in spite of the skepticism of most of the programming world only a year earlier.
Fortran II 编译器于 1958 年春季发布。它修复了 Fortran I 编译系统中的许多错误,并为该语言添加了一些重要特性,其中最重要的是子程序的独立编译。如果没有独立编译,程序中的任何更改都需要重新编译整个程序。Fortran I 缺乏独立编译能力,再加上 704 的可靠性较差,导致程序长度实际上限制在 300 到 400 行左右(Wexelblat,1981 年,第 68 页)。较长的程序在机器发生故障之前很难被完全编译。包含子程序的预编译机器语言版本的能力大大缩短了编译过程,并使开发更大的程序变得切实可行。
The Fortran II compiler was distributed in the spring of 1958. It fixed many of the bugs in the Fortran I compilation system and added some significant features to the language, the most important being the independent compilation of subroutines. Without independent compilation, any change in a program required that the entire program be recompiled. Fortran I’s lack of independent-compilation capability, coupled with the poor reliability of the 704, placed a practical restriction on the length of programs to about 300 to 400 lines (Wexelblat, 1981, p. 68). Longer programs had a poor chance of being compiled completely before a machine failure occurred. The capability of including precompiled machine language versions of subprograms shortened the compilation process considerably and made it practical to develop much larger programs.
Fortran III 已经开发出来,但从未广泛传播。然而,Fortran IV 成为当时使用最广泛的编程语言之一。它在 1960 年至 1962 年间不断发展,并被标准化为 Fortran 66(ANSI,1966),尽管这个名字很少使用。Fortran IV 在许多方面都比 Fortran II 有所改进。其中最重要的新增功能包括变量的显式类型声明、逻辑构造If以及将子程序作为参数传递给其他子程序的能力。
A Fortran III was developed, but it was never widely distributed. Fortran IV, however, became one of the most widely used programming languages of its time. It evolved over the period 1960 to 1962 and was standardized as Fortran 66 (ANSI, 1966), although that name was rarely used. Fortran IV was an improvement over Fortran II in many ways. Among its most important additions were explicit type declarations for variables, a logical If construct, and the capability of passing subprograms as parameters to other subprograms.
Fortran IV 被 Fortran 77 取代,后者于 1978 年成为新标准 ( ANSI, 1978a )。Fortran 77 保留了 Fortran IV 的大部分功能,并添加了字符串处理、逻辑循环控制语句和If带有可选Else子句的功能。
Fortran IV was replaced by Fortran 77, which became the new standard in 1978 (ANSI, 1978a). Fortran 77 retained most of the features of Fortran IV and added character string handling, logical loop control statements, and an If with an optional Else clause.
Fortran 90(ANSI,1992)与 Fortran 77 有很大不同。最显著的新增功能是动态数组、记录、指针、多选语句和模块。此外,Fortran 90 子程序可以递归调用。
Fortran 90 (ANSI, 1992) was dramatically different from Fortran 77. The most significant additions were dynamic arrays, records, pointers, a multiple selection statement, and modules. In addition, Fortran 90 subprograms could be recursively called.
Fortran 90 定义中包含的一个新概念是删除早期版本的一些语言功能。虽然 Fortran 90 包含了 Fortran 77 的所有功能,但语言定义中包含了一份建议在下一版语言中删除的构造列表。
A new concept that was included in the Fortran 90 definition was that of removing some language features from earlier versions. While Fortran 90 included all of the features of Fortran 77, the language definition included a list of constructs that were recommended for removal in the next version of the language.
Fortran 90 包含两个简单的语法变化,它们改变了程序的外观和描述该语言的文献。首先,删除了代码所需的固定格式,即要求对语句的特定部分使用特定的字符位置。例如,语句标签只能出现在前五个位置,语句不能在第七个位置之前开始。这种严格的代码格式是围绕打孔卡的使用而设计的。第二个变化是 FORTRAN 的官方拼写变为 Fortran。这一变化伴随着 Fortran 程序中关键字和标识符全部使用大写字母的惯例的变化。新的惯例是只有关键字和标识符的第一个字母才大写。
Fortran 90 included two simple syntactic changes that altered the appearance of both programs and the literature describing the language. First, the required fixed format of code, which required the use of specific character positions for specific parts of statements, was dropped. For example, statement labels could appear only in the first five positions and statements could not begin before the seventh position. This rigid formatting of code was designed around the use of punch cards. The second change was that the official spelling of FORTRAN became Fortran. This change was accompanied by the change in convention of using all uppercase letters for keywords and identifiers in Fortran programs. The new convention was that only the first letter of keywords and identifiers would be uppercase.
Fortran 95 ( INCITS/ISO/IEC, 1997 ) 继续了语言的演进,但只做了少量改动。除其他改动外,还Forall添加了新的迭代结构,以简化 Fortran 程序的并行化任务。
Fortran 95 (INCITS/ISO/IEC, 1997) continued the evolution of the language, but only a few changes were made. Among other things, a new iteration construct, Forall, was added to ease the task of parallelizing Fortran programs.
Fortran 2003(Metcalf 等人,2004)增加了对面向对象编程、参数化派生类型、过程指针以及与 C 编程语言互操作性的支持。
Fortran 2003 (Metcalf et al., 2004) added support for object-oriented programming, parameterized derived types, procedure pointers, and interoperability with the C programming language.
Fortran 的最新版本 Fortran 2008 (ISO/IEC 1539-1, 2010) 增加了对块的支持,用于定义本地范围、共同数组(提供并行执行模型)和构造DO CONCURRENT(用于指定无相互依赖性的循环)。新版本 Fortran 2015 正在开发中,计划于 2018 年发布。
The latest version of Fortran, Fortran 2008 (ISO/IEC 1539-1, 2010), added support for blocks to define local scopes, co-arrays, which provide a parallel execution model, and the DO CONCURRENT construct, to specify loops without interdependencies. A new version, Fortran 2015, is under development and is scheduled for release in 2018.
Fortran 的最初设计团队认为语言设计只是设计翻译器这一关键任务的必要前提。此外,他们从未想到 Fortran 会用在非IBM 制造。然而,他们被迫考虑为其他 IBM 机器构建 Fortran 编译器,因为 704 的后继产品 709 是在 704 Fortran 编译器发布之前发布的。Fortran 对计算机使用的影响,以及所有后续编程语言都归功于 Fortran 这一事实,鉴于其设计者的谦虚目标,确实令人印象深刻。
The original Fortran design team thought of language design only as a necessary prelude to the critical task of designing the translator. Furthermore, it never occurred to them that Fortran would be used on computers not manufactured by IBM. However, they were forced to consider building Fortran compilers for other IBM machines only because the successor to the 704, the 709, was announced before the 704 Fortran compiler was released. The effect that Fortran has had on the use of computers, along with the fact that all subsequent programming languages owe a debt to Fortran, is indeed impressive in light of the modest goals of its designers.
Fortran I 及其 90 年代之前的所有后续版本都具有一个可高度优化编译器的功能,即所有变量的类型和存储都是在运行前固定的。在执行期间不能分配任何新变量或空间。这牺牲了灵活性,换来了简单性和效率。它消除了递归子程序的可能性,并且很难实现动态增长或改变形状的数据结构。当然,在开发早期版本的 Fortran 时所构建的程序类型主要是数字性质的,与最近的软件项目相比很简单。因此,牺牲并不大。
One of the features of Fortran I, and all of its successors before 90, that allows highly optimizing compilers was that the types and storage for all variables are fixed before run time. No new variables or space could be allocated during execution. This was a sacrifice of flexibility to simplicity and efficiency. It eliminated the possibility of recursive subprograms and made it difficult to implement data structures that grow or change shape dynamically. Of course, the kinds of programs that were being built at the time of the development of the early versions of Fortran were primarily numerical in nature and were simple in comparison with more recent software projects. Therefore, the sacrifice was not a great one.
Fortran 的整体成功是无可厚非的:它极大地改变了计算机的使用方式。当然,这在很大程度上是因为它是第一种广泛使用的高级语言。与后来开发的概念和语言相比,Fortran 的早期版本在很多方面都存在不足,这是意料之中的。毕竟,将 1910 款福特 T 型车的性能和舒适度与 2017 款福特野马的性能和舒适度进行比较是不公平的。尽管如此,尽管 Fortran 存在不足之处,但对 Fortran 软件的巨额投资势头以及其他因素使其得以使用 60 年。
The overall success of Fortran is difficult to overstate: It dramatically changed the way computers are used. This is, of course, in large part due to its being the first widely used high-level language. In comparison with concepts and languages developed later, early versions of Fortran suffer in a variety of ways, as should be expected. After all, it would not be fair to compare the performance and comfort of a 1910 Model T Ford with the performance and comfort of a 2017 Ford Mustang. Nevertheless, in spite of the inadequacies of Fortran, the momentum of the huge investment in Fortran software, among other factors, has kept it in use for 60 years.
ALGOL 60 的设计者之一 Alan Perlis 于 1978 年这样评价 Fortran:“Fortran 是计算机世界的通用语言。它是真正意义上的街头语言,而不是妓女的语言。它已经生存下来并将继续生存下去,因为它已经成为一种非常重要的商业活动中极其有用的一部分”(Wexelblat,1981 年,第 161 页)。
Alan Perlis, one of the designers of ALGOL 60, said of Fortran in 1978, “Fortran is the lingua franca of the computing world. It is the language of the streets in the best sense of the word, not in the prostitutional sense of the word. And it has survived and will survive because it has turned out to be a remarkably useful part of a very vital commerce” (Wexelblat, 1981, p. 161).
以下是 Fortran 95 程序的示例:
The following is an example of a Fortran 95 program:
! Fortran 95 Example program
! Input: An integer, List_Len, where List_Len is less
! than 100, followed by List_Len-Integer values
! Output: The number of input values that are greater
! than the average of all input values
Implicit none
Integer Dimension(99) :: Int_List
Integer :: List_Len, Counter, Sum, Average, Result
Result= 0
Sum = 0
Read *, List_Len
If ((List_Len > 0) .AND. (List_Len < 100)) Then
! Read input data into an array and compute its sum
Do Counter = 1, List_Len
Read *, Int_List(Counter)
Sum = Sum + Int_List(Counter)
End Do
! Compute the average
Average = Sum / List_Len
! Count the values that are greater than the average
Do Counter = 1, List_Len
If (Int_List(Counter) > Average) Then
Result = Result + 1
End If
End Do
! Print the result
Print *, 'Number of values > Average is:', Result
Else
Print *, 'Error - list length value is not legal'
End If
End Program Example
! Fortran 95 Example program
! Input: An integer, List_Len, where List_Len is less
! than 100, followed by List_Len-Integer values
! Output: The number of input values that are greater
! than the average of all input values
Implicit none
Integer Dimension(99) :: Int_List
Integer :: List_Len, Counter, Sum, Average, Result
Result= 0
Sum = 0
Read *, List_Len
If ((List_Len > 0) .AND. (List_Len < 100)) Then
! Read input data into an array and compute its sum
Do Counter = 1, List_Len
Read *, Int_List(Counter)
Sum = Sum + Int_List(Counter)
End Do
! Compute the average
Average = Sum / List_Len
! Count the values that are greater than the average
Do Counter = 1, List_Len
If (Int_List(Counter) > Average) Then
Result = Result + 1
End If
End Do
! Print the result
Print *, 'Number of values > Average is:', Result
Else
Print *, 'Error - list length value is not legal'
End If
End Program Example
第一种函数式编程语言被发明是为了提供列表处理的语言特性,这种需求源于人工智能(AI)领域的第一批应用。
The first functional programming language was invented to provide language features for list processing, the need for which grew out of the first applications in the area of artificial intelligence (AI).
20 世纪 50 年代中期,许多地方都开始对人工智能产生兴趣。这些兴趣有的来自语言学,有的来自心理学,还有的来自数学。语言学家关注自然语言处理。心理学家对人类信息存储和检索的建模以及大脑的其他基本过程感兴趣。数学家对某些智能过程的机械化感兴趣,例如定理证明。所有这些研究都得出了相同的结论:必须开发某种方法,让计算机能够处理链接列表中的符号数据。当时,几乎所有计算都是针对数组中的数字数据进行的。
Interest in AI appeared in the mid-1950s in a number of places. Some of this interest grew out of linguistics, some from psychology, and some from mathematics. Linguists were concerned with natural language processing. Psychologists were interested in modeling human information storage and retrieval, as well as other fundamental processes of the brain. Mathematicians were interested in mechanizing certain intelligent processes, such as theorem proving. All of these investigations arrived at the same conclusion: Some method must be developed to allow computers to process symbolic data in linked lists. At the time, nearly all computation was on numeric data in arrays.
列表处理的概念是由 RAND 公司的 Allen Newell、JC Shaw 和 Herbert Simon 开发的。它首次发表在一篇经典论文中,该论文描述了最早的 AI 程序之一 Logic Theorist2以及一种可以实现它的语言(Newell 和 Simon,1956 年)。该语言名为 IPL-I(信息处理语言 I),从未实现过。下一个版本 IPL-II 在 RAND Johnniac 计算机上实现。IPL 的开发一直持续到 1960 年,当时发表了 IPL-V 的描述(Newell 和 Tonge,1960 年)。IPL 语言的低级别阻碍了它们的广泛使用。它们实际上是用于假想计算机的汇编语言,通过解释器实现,其中包括列表处理指令。阻碍 IPL 语言流行的另一个因素是它们在鲜为人知的 Johnniac 机器上的实现。
The concept of list processing was developed by Allen Newell, J. C. Shaw, and Herbert Simon at the RAND Corporation. It was first published in a classic paper that describes one of the first AI programs, the Logic Theorist,2 and a language in which it could be implemented (Newell and Simon, 1956). The language, named IPL-I (Information Processing Language I), was never implemented. The next version, IPL-II, was implemented on a RAND Johnniac computer. Development of IPL continued until 1960, when the description of IPL-V was published (Newell and Tonge, 1960). The low level of the IPL languages prevented their widespread use. They were actually assembly languages for a hypothetical computer, implemented with an interpreter, in which list-processing instructions were included. Another factor that kept the IPL languages from becoming popular was their implementation on the obscure Johnniac machine.
IPL 语言的贡献在于它们的列表设计以及它们证明列表处理是可行且有用的。
The contributions of the IPL languages were in their list design and their demonstration that list processing was feasible and useful.
IBM 在 20 世纪 50 年代中期开始对人工智能产生兴趣,并选择定理证明作为示范领域。当时,Fortran 项目仍在进行中。Fortran I 编译器的高昂成本使 IBM 确信他们的列表处理应该附加到 Fortran,而不是以新语言的形式出现。因此,Fortran 列表处理语言 (FLPL) 被设计和实现为 Fortran 的扩展。FLPL 被用于构建平面几何的定理证明器,这当时被认为是机械定理证明最简单的领域。
IBM became interested in AI in the mid-1950s and chose theorem proving as a demonstration area. At the time, the Fortran project was still underway. The high cost of the Fortran I compiler convinced IBM that their list processing should be attached to Fortran, rather than in the form of a new language. Thus, the Fortran list processing language (FLPL) was designed and implemented as an extension to Fortran. FLPL was used to construct a theorem prover for plane geometry, which was then considered the easiest area for mechanical theorem proving.
1958 年,麻省理工学院的 John McCarthy 在 IBM 信息研究部获得了一个暑期职位。他暑期的目标是研究符号计算并制定一套进行此类计算的要求。作为一个试点示例问题领域,他选择了代数表达式的微分。这项研究得出了一系列语言要求。其中包括数学函数的控制流方法:递归和条件表达式。当时唯一可用的高级语言 Fortran I 都没有这些。
John McCarthy of MIT took a summer position at the IBM Information Research Department in 1958. His goal for the summer was to investigate symbolic computations and to develop a set of requirements for doing such computations. As a pilot example problem area, he chose differentiation of algebraic expressions. From this study came a list of language requirements. Among them were the control flow methods of mathematical functions: recursion and conditional expressions. The only available high-level language of the time, Fortran I, had neither of these.
符号微分研究的另一个需求是需要动态分配链接列表和某种隐式释放废弃列表。麦卡锡根本不会允许他的优雅微分算法充斥着显式释放语句。
Another requirement that grew from the symbolic-differentiation investigation was the need for dynamically allocated linked lists and some kind of implicit deallocation of abandoned lists. McCarthy simply would not allow his elegant algorithm for differentiation to be cluttered with explicit deallocation statements.
由于 FLPL 不支持递归、条件表达式、动态存储分配或隐式释放,因此 McCarthy 很清楚需要一种新的语言。
Because FLPL did not support recursion, conditional expressions, dynamic storage allocation, or implicit deallocation, it was clear to McCarthy that a new language was needed.
1958 年秋天,麦卡锡回到麻省理工学院后,他和马文·明斯基在麻省理工学院电子研究实验室的资助下成立了麻省理工学院人工智能项目。该项目的第一项重要工作是开发一个用于列表处理的软件系统。它最初用于实现麦卡锡提出的一个名为 Advice Taker 的程序。3这个应用程序成为开发列表处理语言 Lisp 的动力。Lisp 的第一个版本有时被称为“纯 Lisp”,因为它是一种纯函数式语言。在下一节中,我们将描述纯 Lisp 的开发。
When McCarthy returned to MIT in the fall of 1958, he and Marvin Minsky formed the MIT AI Project, with funding from the Research Laboratory for Electronics at MIT. The first important effort of the project was to produce a software system for list processing. It was to be used initially to implement a program proposed by McCarthy called the Advice Taker.3 This application became the impetus for the development of the list-processing language Lisp. The first version of Lisp is sometimes called “pure Lisp” because it is a purely functional language. In the following section, we describe the development of pure Lisp.
纯 Lisp 只有两种数据结构:原子和列表。原子要么是符号(具有标识符的形式),要么是数字文字。将符号信息存储在链接列表中的概念很自然,并在 IPL-II 中使用过。此类结构允许在任何时候插入和删除,这些操作当时被认为是列表处理的必要部分。但最终确定,Lisp 程序很少需要这些操作。
Pure Lisp has only two kinds of data structures: atoms and lists. Atoms are either symbols, which have the form of identifiers, or numeric literals. The concept of storing symbolic information in linked lists is natural and was used in IPL-II. Such structures allow insertions and deletions at any point, operations that were then thought to be a necessary part of list processing. It was eventually determined, however, that Lisp programs rarely require these operations.
列表通过用括号分隔元素来指定。简单列表的元素仅限于原子,其形式为
Lists are specified by delimiting their elements with parentheses. Simple lists, in which elements are restricted to atoms, have the form
(A B C D)(A B C D)
嵌套列表结构也用括号指定。例如,列表
Nested list structures are also specified by parentheses. For example, the list
(A (B C) D (E (F G)))(A (B C) D (E (F G)))
由四个元素组成。第一个是 原子A;第二个是 子列表(B C);第三个是 原子D;第四个是 子列表(E (F G)),其中 子列表 是其第二个元素(F G)。
is composed of four elements. The first is the atom A; the second is the sublist (B C); the third is the atom D; the fourth is the sublist (E (F G)), which has as its second element the sublist (F G).
在内部,列表存储为单链表结构,其中每个节点都有两个指针并代表一个列表元素。包含原子的节点的第一个指针指向原子的某种表示,例如其符号或数值,或指向子列表的指针。子列表元素的节点的第一个指针指向子列表的第一个节点。在这两种情况下,节点的第二个指针都指向列表的下一个元素。列表由指向其第一个元素的指针引用。
Internally, lists are stored as single-linked list structures, in which each node has two pointers and represents a list element. A node containing an atom has its first pointer pointing to some representation of the atom, such as its symbol or numeric value, or a pointer to a sublist. A node for a sublist element has its first pointer pointing to the first node of the sublist. In both cases, the second pointer of a node points to the next element of the list. A list is referenced by a pointer to its first element.
图 2.2描述了前面两个列表的内部表示。请注意,列表的元素是水平显示的。列表的最后一个元素没有后继,因此其链接为,在图2.2NIL中表示为元素中的对角线。子列表具有相同的结构。
The internal representations of the two lists shown earlier are depicted in Figure 2.2. Note that the elements of a list are shown horizontally. The last element of a list has no successor, so its link is NIL, which is represented in Figure 2.2 as a diagonal line in the element. Sublists are shown with the same structure.
Lisp 被设计为一种函数式编程语言。纯函数式程序中的所有计算都是通过将函数应用于参数来完成的。在函数式语言程序中,命令式语言程序中大量出现的赋值语句和变量都不是必需的。此外,可以使用递归函数调用来指定重复过程,从而不需要迭代(循环)。这些函数式编程的基本概念使其与命令式语言编程有很大不同。
Lisp was designed as a functional programming language. All computation in a purely functional program is accomplished by applying functions to arguments. Neither the assignment statements nor the variables that abound in imperative language programs are necessary in functional language programs. Furthermore, repetitive processes can be specified with recursive function calls, making iteration (loops) unnecessary. These basic concepts of functional programming make it significantly different from programming in an imperative language.
Lisp 与命令式语言有很大不同,因为它是一种函数式编程语言,而且 Lisp 程序的外观与 Java 或 C++ 等语言的程序外观截然不同。例如,Java 的语法是英语和代数的复杂混合,而 Lisp 的语法则是简单的典范。程序代码和数据具有完全相同的形式:带括号的列表。再考虑一下列表
Lisp is very different from the imperative languages, both because it is a functional programming language and because the appearance of Lisp programs is so different from those in languages like Java or C++. For example, the syntax of Java is a complicated mixture of English and algebra, while Lisp’s syntax is a model of simplicity. Program code and data have exactly the same form: parenthesized lists. Consider again the list
(A B C D)(A B C D)
A当解释为数据时,它是四个元素的列表。当查看代码时,它是对三个参数B、C和命名的函数的应用D。
When interpreted as data, it is a list of four elements. When viewed as code, it is the application of the function named A to the three parameters B, C, and D.
四分之一世纪以来,Lisp 完全主宰了人工智能应用。Lisp 效率低下的名声已不复存在。许多当代实现都经过编译,生成的代码比在解释器上运行源代码要快得多。除了在人工智能领域的成功之外,Lisp 还开创了函数式编程,这已被证明是编程语言中一个活跃的研究领域。如第1章 所述,许多编程语言研究人员认为,与使用命令式语言的过程式编程相比,函数式编程是一种更好的软件开发方法。
Lisp completely dominated AI applications for a quarter century. Much of the cause of Lisp’s reputation for being highly inefficient has been eliminated. Many contemporary implementations are compiled, and the resulting code is much faster than running the source code on an interpreter. In addition to its success in AI, Lisp pioneered functional programming, which has proven to be a lively area of research in programming languages. As stated in Chapter 1, many programming language researchers believe functional programming is a much better approach to software development than procedural programming using imperative languages.
以下是一个 Lisp 程序的示例:
The following is an example of a Lisp program:
; Lisp Example function
; The following code defines a Lisp predicate function
; that takes two lists as arguments and returns True
; if the two lists are equal, and NIL (false) otherwise
(DEFUN equal_lists (lis1 lis2)
(COND
((ATOM lis1) (EQ lis1 lis2))
((ATOM lis2) NIL)
((equal_lists (CAR lis1) (CAR lis2))
(equal_lists (CDR lis1) (CDR lis2)))
(T NIL)
)
)
; Lisp Example function
; The following code defines a Lisp predicate function
; that takes two lists as arguments and returns True
; if the two lists are equal, and NIL (false) otherwise
(DEFUN equal_lists (lis1 lis2)
(COND
((ATOM lis1) (EQ lis1 lis2))
((ATOM lis2) NIL)
((equal_lists (CAR lis1) (CAR lis2))
(equal_lists (CDR lis1) (CDR lis2)))
(T NIL)
)
)目前,Lisp 的两种方言被广泛使用,即 Scheme 和 Common Lisp。以下小节将简要讨论它们。
Two dialects of Lisp are now widely used, Scheme and Common Lisp. These are briefly discussed in the following subsections.
Scheme 语言诞生于 20 世纪 70 年代中期的麻省理工学院。它的特点是规模小、只使用静态作用域(第5章 将讨论)以及将函数视为一等实体。作为一等实体,Scheme 函数可以分配给变量、作为参数传递以及作为函数应用的值返回。它们也可以是列表的元素。早期版本的 Lisp 并未提供所有这些功能,也没有使用静态作用域。
The Scheme language emerged from MIT in the mid-1970s. It is characterized by its small size, its exclusive use of static scoping (discussed in Chapter 5), and its treatment of functions as first-class entities. As first-class entities, Scheme functions can be assigned to variables, passed as parameters, and returned as the values of function applications. They can also be the elements of lists. Early versions of Lisp did not provide all of these capabilities, nor did they use static scoping.
Scheme 是一种语法和语义简单的小型语言,非常适合教育应用,例如函数式编程课程和编程的一般介绍。第15章 详细介绍了 Scheme 。
As a small language with simple syntax and semantics, Scheme is well suited to educational applications, such as courses in functional programming and general introductions to programming. Scheme is described in some detail in Chapter 15.
在 20 世纪 70 年代和 80 年代初期,人们开发并使用了大量不同的 Lisp 方言。这导致了用各种方言编写的程序之间缺乏可移植性这一常见问题。Common Lisp(Graham,1996)的诞生就是为了纠正这种情况。Common Lisp 的设计理念是将 20 世纪 80 年代初期开发的几种 Lisp 方言(包括 Scheme)的功能合并为一种语言。作为这种混合体,Common Lisp 是一种相对庞大而复杂的语言。但是,它的基础是纯 Lisp,因此它的语法、原始函数和基本性质都来自该语言。
During the 1970s and early 1980s, a large number of different dialects of Lisp were developed and used. This led to the familiar problem of lack of portability among programs written in the various dialects. Common Lisp (Graham, 1996) was created in an effort to rectify this situation. Common Lisp was designed by combining the features of several dialects of Lisp developed in the early 1980s, including Scheme, into a single language. Being such an amalgam, Common Lisp is a relatively large and complex language. Its basis, however, is pure Lisp, so its syntax, primitive functions, and fundamental nature come from that language.
Common Lisp 认识到动态作用域提供的灵活性以及静态作用域的简单性,因此允许两者。变量的默认作用域是静态的,但通过将变量声明为special,该变量将变为动态作用域。
Recognizing the flexibility provided by dynamic scoping as well as the simplicity of static scoping, Common Lisp allows both. The default scoping for variables is static, but by declaring a variable to be special, that variable becomes dynamically scoped.
Common Lisp 拥有大量数据类型和结构,包括记录、数组、复数和字符串。它还具有一种包形式,用于模块化提供访问控制的函数和数据集合。
Common Lisp has a large number of data types and structures, including records, arrays, complex numbers, and character strings. It also has a form of packages for modularizing collections of functions and data providing access control.
Common Lisp is further described in Chapter 15.
ML(Meta Language ;Ullman,1998)最初由爱丁堡大学的 Robin Milner 于 20 世纪 80 年代设计,作为可计算函数逻辑(LCF;Milner 等人,1997 )程序验证系统的元语言。ML 主要是一种函数式语言,但它也支持命令式编程。与 Lisp 和 Scheme 不同,ML 中每个变量和表达式的类型都可以在编译时确定。类型与对象而不是名称相关联。名称和表达式的类型是从其上下文推断出来的。
ML (MetaLanguage; Ullman, 1998) was originally designed in the 1980s by Robin Milner at the University of Edinburgh as a metalanguage for a program verification system named Logic for Computable Functions (LCF; Milner et al., 1997). ML is primarily a functional language, but it also supports imperative programming. Unlike Lisp and Scheme, the type of every variable and expression in ML can be determined at compile time. Types are associated with objects rather than names. Types of names and expressions are inferred from their context.
与 Lisp 和 Scheme 不同,ML 不使用源自 lambda 表达式的带括号函数语法。相反,ML 的语法类似于 Java 和 C++ 等命令式语言的语法。
Unlike Lisp and Scheme, ML does not use the parenthesized functional syntax that originated with lambda expressions. Rather, the syntax of ML resembles that of the imperative languages, such as Java and C++.
Miranda 是由英国坎特伯雷肯特大学的David Turner (1986)于 20 世纪 80 年代初开发的。Miranda 部分基于 ML、SASL 和 KRC 语言。Haskell ( Hudak and Fasel,1992 ) 很大程度上基于 Miranda。与 Miranda 一样,它是一种纯函数式语言,没有变量和赋值语句。Haskell 的另一个显著特征是使用惰性求值。这意味着在需要表达式的值之前不会对其进行求值。这导致该语言具有一些令人惊讶的功能。
Miranda was developed by David Turner (1986) at the University of Kent in Canterbury, England, in the early 1980s. Miranda is based partly on the languages ML, SASL, and KRC. Haskell (Hudak and Fasel, 1992) is based in large part on Miranda. Like Miranda, it is a purely functional language, having no variables and no assignment statement. Another distinguishing characteristic of Haskell is its use of lazy evaluation. This means that no expression is evaluated until its value is required. This leads to some surprising capabilities in the language.
Caml(Cousineau 等人,1998 年)及其支持面向对象编程的方言 OCaml(Smith,2006 年),源自 ML 和 Haskell。最后,F# 是一种相对较新的类型语言,直接基于 OCaml。F#(Syme 等人,2010 年)是一种 .NET 语言,可直接访问整个 .NET 库。作为一种 .NET 语言,它还意味着它可以与任何其他 .NET 语言顺利地进行互操作。F# 支持函数式编程和过程式编程。它还完全支持面向对象编程。
Caml (Cousineau et al., 1998) and its dialect that supports object-oriented programming, OCaml (Smith, 2006), descended from ML and Haskell. Finally, F# is a relatively new typed language based directly on OCaml. F# (Syme et al., 2010) is a .NET language with direct access to the whole .NET library. Being a .NET language also means it can smoothly interoperate with any other .NET language. F# supports both functional programming and procedural programming. It also fully supports object-oriented programming.
ML, Haskell, and F# are further discussed in Chapter 15.
ALGOL 60 对后续的编程语言产生了强烈的影响,因此在任何语言历史研究中都具有重要意义。
ALGOL 60 strongly influenced subsequent programming languages and is therefore of central importance in any historical study of languages.
ALGOL 60 是设计用于科学应用的通用编程语言的努力的成果。到 1954 年底,Laning 和 Zierler 代数系统已经运行了一年多,并且发表了第一份关于 Fortran 的报告。Fortran 于 1957 年成为现实,并且正在开发其他几种高级语言。其中最引人注目的是卡内基理工学院的 Alan Perlis 设计的 IT,以及两种用于 UNIVAC 计算机的语言 MATH-MATIC 和 UNICODE。语言的激增使得用户之间的程序共享变得困难。此外,新语言都是围绕单一架构发展起来的,一些用于 UNIVAC 计算机,一些用于 IBM 700 系列机器。为了应对这种机器相关语言的蓬勃发展,美国几家大型计算机用户团体,包括 SHARE(IBM 科学用户团体)和 USE(UNIVAC Scientific Exchange,大型 UNIVAC 科学用户团体),于 1957 年 5 月 10 日向计算机协会 (ACM) 提交了一份请愿书,要求成立一个委员会来研究并建议采取行动创建一种独立于机器的科学编程语言。尽管 Fortran 可能是一种候选语言,但它无法成为一种通用语言,因为当时它完全归 IBM 所有。
ALGOL 60 was the result of efforts to design a universal programming language for scientific applications. By late 1954, the Laning and Zierler algebraic system had been in operation for over a year, and the first report on Fortran had been published. Fortran became a reality in 1957, and several other high-level languages were being developed. Most notable among them were IT, which was designed by Alan Perlis at Carnegie Tech, and two languages for the UNIVAC computers, MATH-MATIC and UNICODE. The proliferation of languages made program sharing among users difficult. Furthermore, the new languages were all growing up around single architectures, some for UNIVAC computers and some for IBM 700-series machines. In response to this blossoming of machine-dependent languages, several major computer user groups in the United States, including SHARE (the IBM scientific user group) and USE (UNIVAC Scientific Exchange, the large-scale UNIVAC scientific user group), submitted a petition to the Association for Computing Machinery (ACM) on May 10, 1957, to form a committee to study and recommend action to create a machine-independent scientific programming language. Although Fortran might have been a candidate, it could not become a universal language, because at the time it was solely owned by IBM.
此前,1955 年,GAMM(应用数学和力学学会的德语缩写)成立了一个委员会,旨在设计一种通用的、独立于机器的算法语言。欧洲人之所以渴望开发这种新语言,部分原因是担心被 IBM 所主宰。然而,到 1957 年底,美国出现了几种高级语言,这让 GAMM 小组委员会确信,他们的努力必须扩大到包括美国人,于是他们向 ACM 发出了邀请函。1958 年 4 月,在 GAMM 的 Fritz Bauer 向 ACM 提交了正式提案后,两个小组正式同意开展联合语言设计项目。
Previously, in 1955, GAMM (a German acronym for Society for Applied Mathematics and Mechanics) had formed a committee to design one universal, machine-independent algorithmic language. The desire for this new language was in part due to the Europeans’ fear of being dominated by IBM. By late 1957, however, the appearance of several high-level languages in the United States convinced the GAMM subcommittee that their effort had to be widened to include the Americans, and a letter of invitation was sent to ACM. In April 1958, after Fritz Bauer of GAMM presented the formal proposal to ACM, the two groups officially agreed to a joint language design project.
GAMM 和 ACM 各自派出四名成员参加第一次设计会议。会议于 1958 年 5 月 27 日至 6 月 1 日在苏黎世举行,会议最初为新语言设定了以下目标:
GAMM and ACM each sent four members to the first design meeting. The meeting, which was held in Zurich from May 27 to June 1, 1958, began with the following goals for the new language:
该语言的语法应该尽可能接近标准数学符号,并且用该语言编写的程序应该易于阅读且无需进一步解释。
The syntax of the language should be as close as possible to standard mathematical notation, and programs written in it should be readable with little further explanation.
应该可以使用该语言来描述印刷出版物中的算法。
It should be possible to use the language for the description of algorithms in printed publications.
新语言程序必须能够机械地翻译成机器语言。
Programs in the new language must be mechanically translatable into machine language.
第一个目标表明新语言将用于科学编程,这是当时计算机的主要应用领域。第二个目标对计算业务来说是一个全新的事物。最后一个目标对于任何编程语言来说都是显而易见的必需品。
The first goal indicated that the new language was to be used for scientific programming, which was the primary computer application area at that time. The second was something entirely new to the computing business. The last goal is an obvious necessity for any programming language.
苏黎世会议成功制定出了一种符合既定目标的语言,但设计过程需要无数的妥协,既有个人之间,也有大西洋两岸之间。在某些情况下,妥协并不只是针对重大问题,而是针对势力范围。小数点应该用逗号(欧洲方法)还是句号(美国方法)就是一个例子。
The Zurich meeting succeeded in producing a language that met the stated goals, but the design process required innumerable compromises, both among individuals and between the two sides of the Atlantic. In some cases, the compromises were not so much over great issues as they were over spheres of influence. The question of whether to use a comma (the European method) or a period (the American method) for a decimal point is one example.
苏黎世会议上设计的语言被命名为国际算法语言 (IAL)。在设计过程中,有人建议将该语言命名为 ALGOL,即算法语言,但该名称被否决,因为它不能反映委员会的国际范围。然而,在第二年,该语言的名称被更改为 ALGOL,随后被称为 ALGOL 58。
The language designed at the Zurich meeting was named the International Algorithmic Language (IAL). It was suggested during the design that the language be named ALGOL, for ALGOrithmic Language, but the name was rejected because it did not reflect the international scope of the committee. During the following year, however, the name was changed to ALGOL, and the language subsequently became known as ALGOL 58.
从很多方面来看,ALGOL 58 都是 Fortran 的后代,这很自然。它概括了 Fortran 的许多功能,并增加了几个新的结构和概念。其中一些概括与不将语言绑定到任何特定机器的目标有关,而另一些概括则试图使语言更加灵活和强大。这种努力产生了一种罕见的简单和优雅的结合。
In many ways, ALGOL 58 was a descendant of Fortran, which is quite natural. It generalized many of Fortran’s features and added several new constructs and concepts. Some of the generalizations had to do with the goal of not tying the language to any particular machine, and others were attempts to make the language more flexible and powerful. A rare combination of simplicity and elegance emerged from the effort.
ALGOL 58 形式化了数据类型的概念,尽管只有非浮点变量才需要显式声明。它增加了复合语句的概念,大多数后续语言都采用了这种概念。Fortran 的一些通用特性如下:标识符可以有任意长度,而 Fortran I 则限制为六个或更少的字符;允许任意数量的数组维数,而 Fortran I 则限制为不超过三个;数组的下限可以由程序员指定,而在 Fortran 中它隐式为 1;允许嵌套选择语句,而 Fortran I 则不允许。
ALGOL 58 formalized the concept of data type, although only variables that were not floating-point required explicit declaration. It added the idea of compound statements, which most subsequent languages incorporated. Some features of Fortran that were generalized were the following: Identifiers were allowed to have any length, as opposed to Fortran I’s restriction to six or fewer characters; any number of array dimensions was allowed, unlike Fortran I’s limitation to no more than three; the lower bound of arrays could be specified by the programmer, whereas in Fortran it was implicitly 1; nested selection statements were allowed, which was not the case in Fortran I.
ALGOL 58 以一种相当不寻常的方式获得了赋值运算符。Zuse 使用的形式
ALGOL 58 acquired the assignment operator in a rather unusual way. Zuse used the form
表达式=>变量
expression => variable
Plankalkül 中的赋值语句。尽管 Plankalkül 尚未发布,但 ALGOL 58 委员会的一些欧洲成员熟悉该语言。委员会尝试了 Plankalkül 赋值形式,但由于字符集限制的争论,4大于号被改为冒号。然后,在很大程度上在美国人的坚持下,整个语句被转换为 Fortran 形式
for the assignment statement in Plankalkül. Although Plankalkül had not yet been published, some of the European members of the ALGOL 58 committee were familiar with the language. The committee dabbled with the Plankalkül assignment form but, because of arguments about character set limitations,4 the greater-than symbol was changed to a colon. Then, largely at the insistence of the Americans, the whole statement was turned around to the Fortran form
变量:=表达式
variable := expression
欧洲人更喜欢相反的形式,但这与 Fortran 相反。
The Europeans preferred the opposite form, but that would be the reverse of Fortran.
1958 年 12 月,ALGOL 58 报告(Perlis 和 Samelson,1958)的发布受到了热烈欢迎。在美国,这种新语言更多地被视为编程语言设计思想的集合,而非通用标准语言。实际上,ALGOL 58 报告并不是一个成品,而是一个供国际讨论的初步文件。尽管如此,三项主要的设计和实现工作都以该报告为基础。在密歇根大学,MAD 语言诞生了(Arden 等人,1961)。美国海军电子组开发了 NELIAC 语言(Huskey 等人,1963)。系统开发公司设计并实现了 JOVIAL(Shaw,1963)。 JOVIAL 是 Jules 自己的国际代数语言版本的首字母缩写,它是唯一一种基于 ALGOL 58 并得到广泛使用的语言(Jules 是 Jules I. Schwartz,JOVIAL 的设计者之一)。JOVIAL 之所以得到广泛使用,是因为它曾是美国空军四分之一世纪的官方科学语言。
In December 1958, publication of the ALGOL 58 report (Perlis and Samelson, 1958) was greeted with enthusiasm. In the United States, the new language was viewed more as a collection of ideas for programming language design than as a universal standard language. Actually, the ALGOL 58 report was not meant to be a finished product but rather a preliminary document for international discussion. Nevertheless, three major design and implementation efforts used the report as their basis. At the University of Michigan, the MAD language was born (Arden et al., 1961). The U.S. Naval Electronics Group produced the NELIAC language (Huskey et al., 1963). At System Development Corporation, JOVIAL was designed and implemented (Shaw, 1963). JOVIAL, an acronym for Jules’ Own Version of the International Algebraic Language, represents the only language based on ALGOL 58 to achieve widespread use (Jules was Jules I. Schwartz, one of JOVIAL’s designers). JOVIAL became widely used because it was the official scientific language for the U.S. Air Force for a quarter century.
美国计算机界的其他人对这种新语言并不那么友好。起初,IBM 及其主要科学用户组 SHARE 似乎都接受了 ALGOL 58。IBM 在报告发布后不久就开始实施,SHARE 成立了一个小组委员会 SHARE IAL 来研究这种语言。小组委员会随后建议 ACM 将 ALGOL 58 标准化,并建议 IBM 为所有 700 系列计算机实施它。然而,这种热情是短暂的。到 1959 年春天,IBM 和 SHARE 凭借其 Fortran 经验,已经受够了开始使用一种新语言的痛苦和花费,无论是在开发和使用第一代编译器方面,还是在培训用户使用新语言并说服他们使用新语言方面。到 1959 年中期,IBM 和 SHARE 都对 Fortran 产生了浓厚的兴趣,因此他们决定保留它作为IBM 700 系列机器的科学语言,从而放弃了 ALGOL 58 。
The rest of the U.S. computing community was not so kind to the new language. At first, both IBM and its major scientific user group, SHARE, seemed to embrace ALGOL 58. IBM began an implementation shortly after the report was published, and SHARE formed a subcommittee, SHARE IAL, to study the language. The subcommittee subsequently recommended that ACM standardize ALGOL 58 and that IBM implement it for all of the 700-series computers. The enthusiasm was short-lived, however. By the spring of 1959, both IBM and SHARE, through their Fortran experience, had had enough of the pain and expense of getting a new language started, both in terms of developing and using the first-generation compilers and in terms of training users in the new language and persuading them to use it. By the middle of 1959, both IBM and SHARE had developed such a vested interest in Fortran that they decided to retain it as the scientific language for the IBM 700-series machines, thereby abandoning ALGOL 58.
1959 年,欧洲和美国都对 ALGOL 58 进行了激烈的辩论。欧洲版《ALGOL 公报》和《ACM 通讯》刊登了大量修改和补充建议。1959 年最重要的事件之一是苏黎世委员会向国际信息处理大会提交了他们的工作成果,因为巴克斯在会上介绍了一种描述编程语言语法的新符号,后来被称为 BNF(Backus-Naur 范式)。第3章 详细介绍了 BNF 。
During 1959, ALGOL 58 was furiously debated in both Europe and the United States. Large numbers of suggested modifications and additions were published in the European ALGOL Bulletin and in Communications of the ACM. One of the most important events of 1959 was the presentation of the work of the Zurich committee to the International Conference on Information Processing, for there Backus introduced his new notation for describing the syntax of programming languages, which later became known as BNF (Backus-Naur form). BNF is described in detail in Chapter 3.
1960 年 1 月,第二次 ALGOL 会议在巴黎举行。会议的目的是讨论正式提交审议的 80 条建议。丹麦的 Peter Naur 深度参与了 ALGOL 的开发,尽管他没有苏黎世小组成员。Naur 创建并出版了《ALGOL 公报》。他花了大量时间研究 Backus 介绍 BNF 的论文,并决定应该使用 BNF 来正式描述 1960 年会议的结果。在对 BNF 进行一些相对较小的修改后,他用 BNF 编写了新提议语言的描述,并在会议开始时将其分发给 1960 年小组的成员。
In January 1960, the second ALGOL meeting was held, this time in Paris. The purpose of the meeting was to debate the 80 suggestions that had been formally submitted for consideration. Peter Naur of Denmark had become heavily involved in the development of ALGOL, even though he had not been a member of the Zurich group. It was Naur who created and published the ALGOL Bulletin. He spent a good deal of time studying Backus’s paper that introduced BNF and decided that BNF should be used to describe formally the results of the 1960 meeting. After making a few relatively minor changes to BNF, he wrote a description of the new proposed language in BNF and handed it out to the members of the 1960 group at the beginning of the meeting.
尽管 1960 年的会议只持续了六天,但对 ALGOL 58 的修改却非常显著。其中最重要的新进展包括:
Although the 1960 meeting lasted only six days, the modifications made to ALGOL 58 were dramatic. Among the most important new developments were the following:
引入了块结构的概念。这允许程序员通过引入新的数据环境或范围来本地化程序的某些部分。
The concept of block structure was introduced. This allowed the programmer to localize parts of programs by introducing new data environments, or scopes.
允许使用两种不同的方式向子程序传递参数:按值传递和按名称传递。
Two different means of passing parameters to subprograms were allowed: pass by value and pass by name.
程序允许递归。ALGOL 58 的描述对这个问题没有明确说明。请注意,虽然递归对于命令式语言来说是新事物,但 Lisp 早在 1959 年就提供了递归函数。
Procedures were allowed to be recursive. The ALGOL 58 description was unclear on this issue. Note that although recursion was new for the imperative languages, Lisp had already provided recursive functions in 1959.
允许使用堆栈动态数组。堆栈动态数组的下标范围由变量指定,因此数组的大小是在为数组分配存储空间时设置的,这发生在执行期间到达声明时。第6章 详细介绍了堆栈动态数组。
Stack-dynamic arrays were allowed. A stack-dynamic array is one for which the subscript range or ranges are specified by variables, so that the size of the array is set at the time storage is allocated to the array, which happens when the declaration is reached during execution. Stack-dynamic arrays are described in detail in Chapter 6.
一些可能对该语言的成败产生重大影响的功能被提出但被否决。其中最重要的是带格式的输入和输出语句,这些功能被省略,因为它们被认为是与机器相关的。
Several features that might have had a dramatic impact on the success or failure of the language were proposed and rejected. Most important among these were input and output statements with formatting, which were omitted because they were thought to be machine-dependent.
ALGOL 60 报告于 1960 年 5 月发布(Naur,1960 年)。语言描述中仍存在一些模糊之处,第三次会议定于 1962 年 4 月在罗马举行,以解决这些问题。在这次会议上,小组只处理问题;不允许对该语言进行任何添加。这次会议的结果以“算法语言 ALGOL 60 修订报告”为标题发表(Backus 等人,1963 年)。
The ALGOL 60 report was published in May 1960 (Naur, 1960). A number of ambiguities still remained in the language description, and a third meeting was scheduled for April 1962 in Rome to address the problems. At this meeting the group dealt only with problems; no additions to the language were allowed. The results of this meeting were published under the title “Revised Report on the Algorithmic Language ALGOL 60” (Backus et al., 1963).
从某些方面来看,ALGOL 60 是一个巨大的成功;从其他方面来看,它却是一个惨败。它几乎立刻就成为计算文献中唯一可接受的算法交流形式,并且保持了 20 多年的地位。自 1960 年以来设计的每一种命令式编程语言都在某种程度上归功于 ALGOL 60。事实上,大多数都是直接的或间接后代;例如 PL/I、SIMULA 67、ALGOL 68、C、Pascal、Ada、C++、Java 和 C#。
In some ways, ALGOL 60 was a great success; in other ways, it was a dismal failure. It succeeded in becoming, almost immediately, the only acceptable formal means of communicating algorithms in computing literature, and it remained that for more than 20 years. Every imperative programming language designed since 1960 owes something to ALGOL 60. In fact, most are direct or indirect descendants; examples include PL/I, SIMULA 67, ALGOL 68, C, Pascal, Ada, C++, Java, and C#.
ALGOL 58/ALGOL 60 的设计工作包括一系列的“第一”。这是国际团队首次尝试设计一种编程语言。这是第一种设计为独立于机器的语言。它也是第一种语法被正式描述的语言。BNF 形式主义的成功使用开创了计算机科学的几个重要领域:形式语言、解析理论和基于 BNF 的编译器设计。最后,ALGOL 60 的结构影响了机器架构。最引人注目的例子是,该语言的扩展被用作一系列大型计算机(Burroughs B5000、B6000 和 B7000 机器)的系统语言,这些机器采用硬件堆栈设计,以高效地实现语言的块结构和递归子程序。
The ALGOL 58/ALGOL 60 design effort included a long list of firsts. It was the first time that an international group attempted to design a programming language. It was the first language that was designed to be machine independent. It was also the first language whose syntax was formally described. This successful use of the BNF formalism initiated several important fields of computer science: formal languages, parsing theory, and BNF-based compiler design. Finally, the structure of ALGOL 60 affected machine architecture. In the most striking example of this, an extension of the language was used as the systems language of a series of large-scale computers, the Burroughs B5000, B6000, and B7000 machines, which were designed with a hardware stack to implement efficiently the block structure and recursive subprograms of the language.
另一方面,ALGOL 60 在美国从未得到广泛使用。即使在欧洲,它也从未成为主流语言。它未被接受的原因有很多。首先,ALGOL 60 的某些特性过于灵活;它们使得理解困难且实现效率低下。最好的例子是将参数传递给子程序的按名称传递方法,这将在第 9章中 解释。实现 ALGOL 60 的困难可以从 Rutishauser 在 1967 年的声明中看出,几乎没有实现包含完整的 ALGOL 60 语言(Rutishauser,1967 年,第 8 页)。
On the other side of the coin, ALGOL 60 never achieved widespread use in the United States. Even in Europe, where it was more popular than in the United States, it never became the dominant language. There are a number of reasons for its lack of acceptance. For one thing, some of the features of ALGOL 60 turned out to be too flexible; they made understanding difficult and implementation inefficient. The best example of this is the pass-by-name method of passing parameters to subprograms, which is explained in Chapter 9. The difficulties of implementing ALGOL 60 are evidenced by Rutishauser’s statement in 1967 that few, if any, implementations included the full ALGOL 60 language (Rutishauser, 1967, p. 8).
该语言缺乏输入和输出语句是其未被接受的另一个主要原因。依赖于实现的输入/输出使得程序很难移植到其他计算机。
The lack of input and output statements in the language was another major reason for its lack of acceptance. Implementation-dependent input/output made programs difficult to port to other computers.
讽刺的是,ALGOL 60 对计算机科学最重要的贡献之一 BNF 也是其未被接受的一个因素。尽管 BNF 现在被认为是一种简单而优雅的语法描述方式,但在 1960 年,它似乎很奇怪和复杂。
Ironically, one of the most important contributions to computer science associated with ALGOL 60, BNF, was also a factor in its lack of acceptance. Although BNF is now considered a simple and elegant means of syntax description, in 1960 it seemed strange and complicated.
最后,尽管还存在许多其他问题,但 Fortran 在用户中的根深蒂固以及 IBM 缺乏支持可能是 ALGOL 60 未能得到广泛应用的最重要因素。
Finally, although there were many other problems, the entrenchment of Fortran among users and the lack of support by IBM were probably the most important factors in ALGOL 60’s failure to gain widespread use.
ALGOL 60 的努力从未真正完成,因为歧义和模糊性始终是语言描述的一部分(Knuth,1967)。
The ALGOL 60 effort was never really complete, in the sense that ambiguities and obscurities were always a part of the language description (Knuth, 1967).
以下是 ALGOL 60 程序的示例:
The following is an example of an ALGOL 60 program:
comment ALGOL 60 Example Program
Input: An integer, listlen, where listlen is less than
100, followed by listlen-integer values
Output: The number of input values that are greater than
the average of all the input values ;
begin
integer array intlist [1:99];
integer listlen, counter, sum, average, result;
sum := 0;
result := 0;
readint (listlen);
if (listlen > 0) ^ (listlen < 100) then
begin
comment Read input into an array and compute the average;
for counter := 1 step 1 until listlen do
begin
readint (intlist[counter]);
sum := sum + intlist[counter]
end;
comment Compute the average;
average := sum / listlen;
comment Count the input values that are > average;
for counter := 1 step 1 until listlen do
if intlist[counter] > average
then result := result + 1;
comment Print result;
printstring("The number of values > average is:");
printint (result)
end
else
printstring ("Error-input list length is not legal";
end
comment ALGOL 60 Example Program
Input: An integer, listlen, where listlen is less than
100, followed by listlen-integer values
Output: The number of input values that are greater than
the average of all the input values ;
begin
integer array intlist [1:99];
integer listlen, counter, sum, average, result;
sum := 0;
result := 0;
readint (listlen);
if (listlen > 0) ^ (listlen < 100) then
begin
comment Read input into an array and compute the average;
for counter := 1 step 1 until listlen do
begin
readint (intlist[counter]);
sum := sum + intlist[counter]
end;
comment Compute the average;
average := sum / listlen;
comment Count the input values that are > average;
for counter := 1 step 1 until listlen do
if intlist[counter] > average
then result := result + 1;
comment Print result;
printstring("The number of values > average is:");
printint (result)
end
else
printstring ("Error-input list length is not legal";
end
从某种意义上说,COBOL 的故事与 ALGOL 60 的故事截然相反。尽管已经使用了近 60 年,但 COBOL 对后续语言(PL/I 除外)的设计影响甚微。它可能仍然是使用最广泛的语言5,尽管很难确定是谁使用哪种语言。COBOL 影响力不大的最重要原因可能是,自它出现以来,很少有人尝试为商业应用程序设计一种新语言。部分原因是 COBOL 的功能能够很好地满足其应用领域的需求。另一个原因是,过去 30 年,商业计算的大量增长发生在小型企业中。在这些企业中,几乎没有进行软件开发。相反,使用的大多数软件都是作为各种通用商业应用程序的现成软件包购买的。
The story of COBOL is, in a sense, the opposite of that of ALGOL 60. Although it has been used for nearly 60 years, COBOL has had little effect on the design of subsequent languages, except for PL/I. It may still be the most widely used language,5 although it is difficult to be sure one way or the other. Perhaps the most important reason why COBOL has had little influence is that few have attempted to design a new language for business applications since it appeared. That is due in part to how well COBOL’s capabilities meet the needs of its application area. Another reason is that a great deal of growth in business computing over the past 30 years has occurred in small businesses. In these businesses, very little software development has taken place. Instead, most of the software used is purchased as off-the-shelf packages for various general business applications.
COBOL 的起源与 ALGOL 60 有些相似,因为该语言是由一个相对较短时间的委员会设计的。当时,即 1959 年,商业计算的状态与几年前 Fortran 设计时的科学计算状态相似。一种用于商业应用的编译语言 FLOW-MATIC 已于 1957 年实现,但它属于一家制造商 UNIVAC,并且是为该公司的计算机设计的。另一种语言 AIMACO 正在被美国空军使用,但它只是 FLOW-MATIC 的一个微小变体。IBM 已经设计了一种用于商业应用的编程语言 COMTRAN(商业翻译器),但尚未实现。当时正在计划其他几个语言设计项目。
The beginning of COBOL is somewhat similar to that of ALGOL 60, in the sense that the language was designed by a committee of people meeting for relatively short periods of time. At the time, in 1959, the state of business computing was similar to the state of scientific computing several years earlier, when Fortran was being designed. One compiled language for business applications, FLOW-MATIC, had been implemented in 1957, but it belonged to one manufacturer, UNIVAC, and was designed for that company’s computers. Another language, AIMACO, was being used by the U.S. Air Force, but it was only a minor variation of FLOW-MATIC. IBM had designed a programming language for business applications, COMTRAN (COMmercial TRANslator), but it had not yet been implemented. Several other language design projects were being planned.
FLOW-MATIC 的起源值得至少简要讨论一下,因为它是 COBOL 的主要前身。1953 年 12 月,雷明顿-兰德 UNIVAC 的 Grace Hopper 写了一份确实具有预言性的提案。它建议“数学程序应该用数学符号编写,数据处理程序应该用英文语句编写”(Wexelblat,1981,第 16 页)。不幸的是,在 1953 年,不可能让非程序员相信计算机可以理解英文单词。直到 1955 年,类似的提案才有希望得到 UNIVAC 管理层的资助,即使在那时,也需要使用原型系统来做最后的说服工作。这个销售过程的一部分涉及编译和运行一个小程序,首先使用英文关键字,然后使用法文关键字,最后使用德文关键字。UNIVAC 管理层认为这次演示非常了不起,并促使他们接受了 Hopper 的提议。
The origins of FLOW-MATIC are worth at least a brief discussion, because it was the primary progenitor of COBOL. In December 1953, Grace Hopper at Remington-Rand UNIVAC wrote a proposal that was indeed prophetic. It suggested that “mathematical programs should be written in mathematical notation, data processing programs should be written in English statements” (Wexelblat, 1981, p. 16). Unfortunately, in 1953, it was impossible to convince nonprogrammers that a computer could be made to understand English words. It was not until 1955 that a similar proposal had some hope of being funded by UNIVAC management, and even then it took a prototype system to do the final convincing. Part of this selling process involved compiling and running a small program, first using English keywords, then using French keywords, and then using German keywords. This demonstration was considered remarkable by UNIVAC management and was instrumental in their acceptance of Hopper’s proposal.
1959 年 5 月 28 日至 29 日(苏黎世 ALGOL 会议刚好一周年),在五角大楼举行了第一次关于商业应用通用语言的正式会议,会议由国防部主办。与会人员一致认为,这种当时被称为 CBL(通用商业语言)的语言应具备以下一般特征:大多数人同意应尽可能使用英语,尽管也有少数人主张使用更多的数学符号。该语言必须易于使用,即使功能不够强大,也能扩大计算机编程人员的范围。除了使语言易于使用之外,人们还认为使用英语可以让管理人员更容易阅读程序。最后,设计不应受到其实施问题的过度限制。
The first formal meeting on the subject of a common language for business applications, which was sponsored by the Department of Defense, was held at the Pentagon on May 28 and 29, 1959 (exactly one year after the Zurich ALGOL meeting). The consensus of the group was that the language, then named CBL (Common Business Language), should have the following general characteristics: Most agreed that it should use English as much as possible, although a few argued for a more mathematical notation. The language must be easy to use, even at the expense of being less powerful, in order to broaden the base of those who could program computers. In addition to making the language easy to use, it was believed that the use of English would allow managers to read programs. Finally, the design should not be overly restricted by the problems of its implementation.
会议上最受关注的问题之一是应迅速采取措施创建这种通用语言,因为创建其他商业语言的工作已经开展得非常多了。除了现有语言外,RCA 和 Sylvania 还在开发自己的商业应用语言。很明显,开发通用语言所花的时间越长,该语言被广泛使用的难度就越大。在此基础上,大家决定应快速研究现有语言。为此,成立了短期委员会。
One of the overriding concerns at the meeting was that steps to create this universal language should be taken quickly, as a lot of work was already being done to create other business languages. In addition to the existing languages, RCA and Sylvania were working on their own business applications languages. It was clear that the longer it took to produce a universal language, the more difficult it would be for the language to become widely used. On this basis, it was decided that there should be a quick study of existing languages. For this task, the Short Range Committee was formed.
早期曾决定将语言的语句分为两类——数据描述和可执行操作——并将这两类语句放在程序的不同部分。短程委员会的争论之一是关于是否包含下标。许多委员会成员认为,下标对于数据处理人员来说太复杂了,因为他们不熟悉数学符号。类似的争论围绕着是否应该包含算术表达式。短程委员会的最终报告于 1959 年 12 月完成,描述了后来被命名为 COBOL 60 的语言。
There were early decisions to separate the statements of the language into two categories—data description and executable operations—and to have statements in these two categories be in different parts of programs. One of the debates of the Short Range Committee was over the inclusion of subscripts. Many committee members argued that subscripts were too complex for the people in data processing, who were thought to be uncomfortable with mathematical notation. Similar arguments revolved around whether arithmetic expressions should be included. The final report of the Short Range Committee, which was completed in December 1959, described the language that was later named COBOL 60.
美国政府印刷局于 1960 年 4 月发布了 COBOL 60 语言规范(国防部,1960 年),该规范被称为“初始版”。修订版于 1961 年和 1962 年发布(国防部,1961 年、1962 年)。该语言于 1968 年由美国国家标准协会 (ANSI) 小组标准化。接下来的三次修订版分别于 1974 年、1985 年和 2002 年由 ANSI 标准化。该语言至今仍在不断发展。
The language specification for COBOL 60, published by the Government Printing Office in April 1960 (Department of Defense, 1960), was described as “initial.” Revised versions were published in 1961 and 1962 (Department of Defense, 1961, 1962). The language was standardized by the American National Standards Institute (ANSI) group in 1968. The next three revisions were standardized by ANSI in 1974, 1985, and 2002. The language continues to evolve today.
COBOL 语言催生了许多新概念,其中一些最终出现在其他语言中。例如,DEFINECOBOL 60 的动词是第一个用于宏的高级语言构造。更重要的是,首次出现在 Plankalkül 中的分层数据结构(记录)首先在 COBOL 中实现。此后设计的大多数命令式语言都包含它们。COBOL 也是第一个允许名称真正具有内涵的语言,因为它允许使用长名称(最多 30 个字符)和单词连接符(连字符)。
The COBOL language originated a number of novel concepts, some of which eventually appeared in other languages. For example, the DEFINE verb of COBOL 60 was the first high-level language construct for macros. More important, hierarchical data structures (records), which first appeared in Plankalkül, were first implemented in COBOL. They have been included in most of the imperative languages designed since then. COBOL was also the first language that allowed names to be truly connotative, because it allowed both long names (up to 30 characters) and word-connector characters (hyphens).
总体而言,数据部分是 COBOL 设计的强项,而过程部分则相对较弱。数据部分中详细定义了每个变量,包括小数位数和隐含小数点的位置。文件记录也以这种详细程度进行描述,要输出到打印机的行也是如此,这使得 COBOL 成为打印会计报告的理想选择。原始过程部分最重要的弱点可能在于缺乏功能。1974 年标准之前的 COBOL 版本也不允许使用带参数的子程序。
Overall, the data division is the strong part of COBOL’s design, whereas the procedure division is relatively weak. Every variable is defined in detail in the data division, including the number of decimal digits and the location of the implied decimal point. File records are also described with this level of detail, as are lines to be output to a printer, which makes COBOL ideal for printing accounting reports. Perhaps the most important weakness of the original procedure division was in its lack of functions. Versions of COBOL prior to the 1974 standard also did not allow subprograms with parameters.
我们对 COBOL 的最终评价是:它是第一种由国防部 (DoD) 强制使用的编程语言。该强制要求在其最初开发之后,COBOL 就不再被使用,因为 COBOL 并非专门为国防部设计的。尽管 COBOL 有其优点,但如果没有这一规定,它很可能无法生存。早期编译器的性能不佳,导致该语言使用成本过高。当然,最终编译器变得更高效,计算机变得更快、更便宜,内存也更大。这些因素共同使 COBOL 在国防部内外都获得了成功。它的出现导致了会计的电子机械化,无论以何种标准衡量,这都是一项重要的发展。
Our final comment on COBOL: It was the first programming language whose use was mandated by the Department of Defense (DoD). This mandate came after its initial development, because COBOL was not designed specifically for the DoD. In spite of its merits, COBOL probably would not have survived without that mandate. The poor performance of the early compilers simply made the language too expensive to use. Eventually, of course, compilers became more efficient and computers became much faster and cheaper and had much larger memories. Together, these factors allowed COBOL to succeed, inside and outside DoD. Its appearance led to the electronic mechanization of accounting, an important development by any measure.
以下是 COBOL 程序的示例。此程序读取名为 的文件,BAL-FWD-File其中包含有关特定物品集合的库存信息。除其他信息外,每个物品记录包括当前库存数量 ( BAL-ON-HAND) 和物品的重新订购点 ( BAL-REORDER-POINT)。重新订购点是库存物品的阈值,达到该阈值时必须订购更多物品。该程序生成必须重新订购的物品列表,作为名为 的文件REORDER-LISTING。
The following is an example of a COBOL program. This program reads a file named BAL-FWD-File that contains inventory information about a certain collection of items. Among other things, each item record includes the number currently on hand (BAL-ON-HAND) and the item’s reorder point (BAL-REORDER-POINT). The reorder point is the threshold number of items on hand at which more must be ordered. The program produces a list of items that must be reordered as a file named REORDER-LISTING.
IDENTIFICATION DIVISION.
PROGRAM-ID. PRODUCE-REORDER-LISTING.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SOURCE-COMPUTER. DEC-VAX.
OBJECT-COMPUTER. DEC-VAX.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT BAL-FWD-FILE ASSIGN TO READER.
SELECT REORDER-LISTING ASSIGN TO LOCAL-PRINTER.
DATA DIVISION.
FILE SECTION.
FD BAL-FWD-FILE
LABEL RECORDS ARE STANDARD
RECORD CONTAINS 80 CHARACTERS.
01 BAL-FWD-CARD.
02 BAL-ITEM-NO PICTURE IS 9(5).
02 BAL-ITEM-DESC PICTURE IS X(20).
02 FILLER PICTURE IS X(5).
02 BAL-UNIT-PRICE PICTURE IS 999V99.
02 BAL-REORDER-POINT PICTURE IS 9(5).
02 BAL-ON-HAND PICTURE IS 9(5).
02 BAL-ON-ORDER PICTURE IS 9(5).
02 FILLER PICTURE IS X(30).
FD REORDER-LISTING
LABEL RECORDS ARE STANDARD
RECORD CONTAINS 132 CHARACTERS.
01 REORDER-LINE.
02 RL-ITEM-NO PICTURE IS Z(5).
02 FILLER PICTURE IS X(5).
02 RL-ITEM-DESC PICTURE IS X(20).
02 FILLER PICTURE IS X(5).
02 RL-UNIT-PRICE PICTURE IS ZZZ.99.
02 FILLER PICTURE IS X(5).
02 RL-AVAILABLE-STOCK PICTURE IS Z(5).
02 FILLER PICTURE IS X(5).
02 RL-REORDER-POINT PICTURE IS Z(5).
02 FILLER PICTURE IS X(71).
WORKING-STORAGE SECTION.
01 SWITCHES.
02 CARD-EOF-SWITCH PICTURE IS X.
01 WORK-FIELDS.
02 AVAILABLE-STOCK PICTURE IS 9(5).
PROCEDURE DIVISION.
000-PRODUCE-REORDER-LISTING.
OPEN INPUT BAL-FWD-FILE.
OPEN OUTPUT REORDER-LISTING.
MOVE "N" TO CARD-EOF-SWITCH.
PERFORM 100-PRODUCE-REORDER-LINE
UNTIL CARD-EOF-SWITCH IS EQUAL TO "Y".
CLOSE BAL-FWD-File.
CLOSE REORDER-LISTING.
STOP RUN.
100-PRODUCE-REORDER-LINE.
PERFORM 110-READ-INVENTORY-RECORD.
IF CARD-EOF-SWITCH IS NOT EQUAL TO "Y"]
PERFORM 120-CALCULATE-AVAILABLE-STOCK
IF AVAILABLE-STOCK IS LESS THAN BAL-REORDER-POINT
PERFORM 130-PRINT-REORDER-LINE.
110-READ-INVENTORY-RECORD.
READ BAL-FWD-FILE RECORD
AT END
MOVE "Y" TO CARD-EOF-SWITCH.
120-CALCULATE-AVAILABLE-STOCK.
ADD BAL-ON-HAND BAL-ON-ORDER
GIVING AVAILABLE-STOCK.
130-PRINT-REORDER-LINE.
MOVE SPACE TO REORDER-LINE.
MOVE BAL-ITEM-NO TO RL-ITEM-NO.
MOVE BAL-ITEM-DESC TO RL-ITEM-DESC.
MOVE BAL-UNIT-PRICE TO RL-UNIT-PRICE.
MOVE AVAILABLE-STOCK TO RL-AVAILABLE-STOCK.
MOVE BAL-REORDER-POINT TO RL-REORDER-POINT.
WRITE REORDER-LINE.
IDENTIFICATION DIVISION.
PROGRAM-ID. PRODUCE-REORDER-LISTING.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SOURCE-COMPUTER. DEC-VAX.
OBJECT-COMPUTER. DEC-VAX.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT BAL-FWD-FILE ASSIGN TO READER.
SELECT REORDER-LISTING ASSIGN TO LOCAL-PRINTER.
DATA DIVISION.
FILE SECTION.
FD BAL-FWD-FILE
LABEL RECORDS ARE STANDARD
RECORD CONTAINS 80 CHARACTERS.
01 BAL-FWD-CARD.
02 BAL-ITEM-NO PICTURE IS 9(5).
02 BAL-ITEM-DESC PICTURE IS X(20).
02 FILLER PICTURE IS X(5).
02 BAL-UNIT-PRICE PICTURE IS 999V99.
02 BAL-REORDER-POINT PICTURE IS 9(5).
02 BAL-ON-HAND PICTURE IS 9(5).
02 BAL-ON-ORDER PICTURE IS 9(5).
02 FILLER PICTURE IS X(30).
FD REORDER-LISTING
LABEL RECORDS ARE STANDARD
RECORD CONTAINS 132 CHARACTERS.
01 REORDER-LINE.
02 RL-ITEM-NO PICTURE IS Z(5).
02 FILLER PICTURE IS X(5).
02 RL-ITEM-DESC PICTURE IS X(20).
02 FILLER PICTURE IS X(5).
02 RL-UNIT-PRICE PICTURE IS ZZZ.99.
02 FILLER PICTURE IS X(5).
02 RL-AVAILABLE-STOCK PICTURE IS Z(5).
02 FILLER PICTURE IS X(5).
02 RL-REORDER-POINT PICTURE IS Z(5).
02 FILLER PICTURE IS X(71).
WORKING-STORAGE SECTION.
01 SWITCHES.
02 CARD-EOF-SWITCH PICTURE IS X.
01 WORK-FIELDS.
02 AVAILABLE-STOCK PICTURE IS 9(5).
PROCEDURE DIVISION.
000-PRODUCE-REORDER-LISTING.
OPEN INPUT BAL-FWD-FILE.
OPEN OUTPUT REORDER-LISTING.
MOVE "N" TO CARD-EOF-SWITCH.
PERFORM 100-PRODUCE-REORDER-LINE
UNTIL CARD-EOF-SWITCH IS EQUAL TO "Y".
CLOSE BAL-FWD-File.
CLOSE REORDER-LISTING.
STOP RUN.
100-PRODUCE-REORDER-LINE.
PERFORM 110-READ-INVENTORY-RECORD.
IF CARD-EOF-SWITCH IS NOT EQUAL TO "Y"]
PERFORM 120-CALCULATE-AVAILABLE-STOCK
IF AVAILABLE-STOCK IS LESS THAN BAL-REORDER-POINT
PERFORM 130-PRINT-REORDER-LINE.
110-READ-INVENTORY-RECORD.
READ BAL-FWD-FILE RECORD
AT END
MOVE "Y" TO CARD-EOF-SWITCH.
120-CALCULATE-AVAILABLE-STOCK.
ADD BAL-ON-HAND BAL-ON-ORDER
GIVING AVAILABLE-STOCK.
130-PRINT-REORDER-LINE.
MOVE SPACE TO REORDER-LINE.
MOVE BAL-ITEM-NO TO RL-ITEM-NO.
MOVE BAL-ITEM-DESC TO RL-ITEM-DESC.
MOVE BAL-UNIT-PRICE TO RL-UNIT-PRICE.
MOVE AVAILABLE-STOCK TO RL-AVAILABLE-STOCK.
MOVE BAL-REORDER-POINT TO RL-REORDER-POINT.
WRITE REORDER-LINE.
Basic(Mather 和 Waite,1971 年)是另一种广泛使用但未受重视的编程语言。与 COBOL 一样,它在很大程度上被计算机科学家忽视了。此外,与 COBOL 一样,它的早期版本不够优雅,仅包含一组微不足道的控制语句。
Basic (Mather and Waite, 1971) is another programming language that has enjoyed widespread use but has gotten little respect. Like COBOL, it has largely been ignored by computer scientists. Also, like COBOL, in its earliest versions it was inelegant and included only a meager set of control statements.
20 世纪 70 年代末和 80 年代初,Basic 在微型计算机上非常流行。这直接源于 Basic 早期版本的两个主要特点。初学者(尤其是那些不擅长科学的人)很容易学习,而且它的小方言可以在内存很小的计算机上实现。6当微型计算机的功能增强并且其他语言被实现时,Basic 的使用逐渐减少。20 世纪 90 年代初,随着 Visual Basic(微软,1991 年)的出现,Basic 的使用开始强劲复苏。
Basic was very popular on microcomputers in the late 1970s and early 1980s. This followed directly from two of the main characteristics of early versions of Basic. It was easy for beginners to learn, especially those who were not science oriented, and its smaller dialects could be implemented on computers with very small memories.6 When the capabilities of microcomputers grew and other languages were implemented, the use of Basic waned. A strong resurgence in the use of Basic began with the appearance of Visual Basic (Microsoft, 1991) in the early 1990s.
Basic(初学者通用符号指令代码)最初由两位数学家 John Kemeny 和 Thomas Kurtz 在新罕布什尔州的达特茅斯学院(现达特茅斯大学)设计,他们在 20 世纪 60 年代初开发了 Fortran 和 ALGOL 60 的各种方言的编译器。他们的理科生在学习或使用这些语言时通常没有遇到什么困难。然而,达特茅斯主要是一所文理学院,理工科学生仅占学生总数的 25% 左右。1963 年春天,他们决定专门为文科生设计一种新语言。这种新语言将使用终端作为计算机访问的方法。该系统的目标如下:
Basic (Beginner’s All-purpose Symbolic Instruction Code) was originally designed at Dartmouth College (now Dartmouth University) in New Hampshire by two mathematicians, John Kemeny and Thomas Kurtz, who, in the early 1960s, developed compilers for a variety of dialects of Fortran and ALGOL 60. Their science students generally had little trouble learning or using those languages in their studies. However, Dartmouth was primarily a liberal arts institution, where science and engineering students made up only about 25 percent of the student body. It was decided in the spring of 1963 to design a new language especially for liberal arts students. This new language would use terminals as the method of computer access. The goals of the system were as follows:
它必须便于非科学专业的学生学习和使用。
It must be easy for nonscience students to learn and use.
它必须“令人愉快且友好”。
It must be “pleasant and friendly.”
它必须快速完成家庭作业。
It must provide fast turnaround for homework.
它必须允许自由和私人访问。
It must allow free and private access.
它必须认为用户时间比计算机时间更重要。
It must consider user time more important than computer time.
最后一个目标确实是一个革命性的概念。它至少部分基于这样的信念:随着时间的推移,计算机的价格将大幅下降,而事实确实如此。
The last goal was indeed a revolutionary concept. It was based at least partly on the belief that computers would become significantly cheaper as time went on, which they certainly did.
第二、第三和第四个目标的结合导致了 Basic 的分时特性。在 20 世纪 60 年代初期,只有通过大量用户同时通过终端进行单独访问,才能实现这些目标。
The combination of the second, third, and fourth goals led to the time-shared aspect of Basic. Only with individual access through terminals by numerous simultaneous users could these goals be met in the early 1960s.
1963 年夏天,Kemeny 开始开发 Basic 第一版的编译器,使用远程访问 GE 225 计算机。Basic 操作系统的设计和编码始于 1963 年秋天。1964年 5 月 1 日凌晨4 点,第一个使用分时 Basic 的程序被输入并运行。6 月份,系统上的终端数量增加到 11 个,到秋天则激增到 20 个。
In the summer of 1963, Kemeny began work on the compiler for the first version of Basic, using remote access to a GE 225 computer. Design and coding of the operating system for Basic began in the fall of 1963. At 4:00 a.m. on May 1, 1964, the first program using the timeshared Basic was typed in and run. In June, the number of terminals on the system grew to 11, and by the fall it had ballooned to 20.
最初的 Basic 版本非常小,而且奇怪的是,它不是交互式的:执行程序无法从用户那里获取输入数据。程序以一种面向批处理的方式输入、编译和运行。最初的 Basic 只有 14 种不同的语句类型和一种数据类型 — 浮点数。因为人们认为很少有目标用户会理解整数和浮点类型之间的区别,所以这种类型被称为“数字”。总的来说,这是一种功能非常有限的语言,但相当容易学习。
The original version of Basic was very small and, oddly, was not interactive: There was no way for an executing program to get input data from the user. Programs were typed in, compiled, and run, in a sort of batch-oriented way. The original Basic had only 14 different statement types and a single data type—floating-point. Because it was believed that few of the targeted users would appreciate the difference between integer and floating-point types, the type was referred to as “numbers.” Overall, it was a very limited language, though quite easy to learn.
最初的 Basic 最重要的方面是,它是第一种通过连接到远程计算机的终端使用的广泛使用的语言。7当时终端才刚刚开始出现。 在此之前,大多数程序都是通过打孔卡或纸带输入计算机的。
The most important aspect of the original Basic was that it was the first widely used language that was used through terminals connected to a remote computer.7 Terminals had just begun to be available at that time. Before then, most programs were entered into computers through either punched cards or paper tape.
Basic 的大部分设计都源自 Fortran,并受到 ALGOL 60 语法的轻微影响。后来,它以各种方式发展,但几乎没有做出任何努力来标准化它。美国国家标准协会发布了最低限度的 Basic 标准 ( ANSI, 1978b ),但这仅代表了语言功能的最低限度。事实上,最初的 Basic 与最低限度的 Basic 非常相似。
Much of the design of Basic came from Fortran, with some minor influence from the syntax of ALGOL 60. Later, it grew in a variety of ways, with little or no effort made to standardize it. The American National Standards Institute issued a Minimal Basic standard (ANSI, 1978b), but this represented only the bare minimum of language features. In fact, the original Basic was very similar to Minimal Basic.
尽管这可能看起来令人惊讶,但数字设备公司使用一个相当复杂的 Basic 版本(名为 Basic-PLUS)来编写重要的20 世纪 70 年代,他们为 PDP-11 小型计算机开发了最大的操作系统 RSTS 的部分。
Although it may seem surprising, Digital Equipment Corporation used a rather elaborate version of Basic named Basic-PLUS to write significant portions of their largest operating system for the PDP-11 minicomputers, RSTS, in the 1970s.
Basic 因用它编写的程序结构不良等问题而受到批评。根据第1章 中讨论的评估标准,特别是可读性和可靠性,该语言确实表现很差。显然,该语言的早期版本不适合也不应该用于任何大型的严肃程序。后来的版本更适合这类任务。
Basic has been criticized for the poor structure of programs written in it, among other things. By the evaluation criteria discussed in Chapter 1, specifically readability and reliability, the language does indeed fare very poorly. Clearly, the early versions of the language were not meant for and should not have been used for serious programs of any significant size. Later versions are much better suited to such tasks.
20 世纪 90 年代,Basic 的复兴是由 Visual Basic (VB) 的出现推动的。VB 之所以得到广泛使用,很大程度上是因为它提供了一种构建图形用户界面 (GUI) 的简单方法,因此得名 Visual Basic。当 .NET 出现时,VB 的新版本 VB.NET 也随之出现。尽管它与早期版本的 VB 有很大不同,但它很快就取代了旧语言。VB 和 .NET 版本之间最重要的区别可能在于,后者完全支持面向对象编程。
The resurgence of Basic in the 1990s was driven by the appearance of Visual Basic (VB). VB became widely used in large part because it provided a simple way of building graphical user interfaces (GUIs), hence the name Visual Basic. When .NET appeared, a new version of VB came with it, VB.NET. Although it was a significant departure from earlier versions of VB, it quickly displaced the older language. Perhaps the most important difference between VB and the .NET version is that the later version fully supports object-oriented programming.
以下是 Basic 程序的示例:
The following is an example of a Basic program:
REM Basic Example Program
REM Input: An integer, listlen, where listlen is less
REM than 100, followed by listlen-integer values
REM Output: The number of input values that are greater
REM than the average of all input values
DIM intlist(99)
result = 0
sum = 0
INPUT listlen
IF listlen > 0 AND listlen < 100 THEN
REM Read input into an array and compute the sum
FOR counter = 1 TO listlen
INPUT intlist(counter)
sum = sum + intlist(counter)
NEXT counter
REM Compute the average
average = sum / listlen
REM Count the number of input values that are > average
FOR counter = 1 TO listlen
IF intlist(counter) > average
THEN result = result + 1
NEXT counter
REM Print the result
PRINT "The number of values that are > average is:";
result
ELSE
PRINT "Error-input list length is not legal"
END IF
END
REM Basic Example Program
REM Input: An integer, listlen, where listlen is less
REM than 100, followed by listlen-integer values
REM Output: The number of input values that are greater
REM than the average of all input values
DIM intlist(99)
result = 0
sum = 0
INPUT listlen
IF listlen > 0 AND listlen < 100 THEN
REM Read input into an array and compute the sum
FOR counter = 1 TO listlen
INPUT intlist(counter)
sum = sum + intlist(counter)
NEXT counter
REM Compute the average
average = sum / listlen
REM Count the number of input values that are > average
FOR counter = 1 TO listlen
IF intlist(counter) > average
THEN result = result + 1
NEXT counter
REM Print the result
PRINT "The number of values that are > average is:";
result
ELSE
PRINT "Error-input list length is not legal"
END IF
END
PL/I 代表了首次大规模尝试设计一种可用于广泛应用领域的语言。所有之前和之后的语言都专注于一个特定的应用领域,例如科学、人工智能或商业。
PL/I represents the first large-scale attempt to design a language that could be used for a broad spectrum of application areas. All previous and most subsequent languages have focused on one particular application area, such as science, artificial intelligence, or business.
与 Fortran 一样,PL/I 也是作为 IBM 产品开发的。到 20 世纪 60 年代初,工业计算机用户已经分成了两个截然不同的阵营:科学和商业。从 IBM 的角度来看,科学程序员可以使用大型 7090 或小型 1620 IBM 计算机。该组广泛使用浮点数据和数组。Fortran 是主要语言,尽管也使用了一些汇编语言。他们有自己的用户组 SHARE,并且与从事商业应用程序的任何人几乎没有联系。
Like Fortran, PL/I was developed as an IBM product. By the early 1960s, the users of computers in industry had settled into two separate and quite different camps: scientific and business. From the IBM point of view, scientific programmers could use either the large-scale 7090 or the small-scale 1620 IBM computers. This group used floating-point data and arrays extensively. Fortran was the primary language, although some assembly language also was used. They had their own user group, SHARE, and had little contact with anyone who worked on business applications.
对于商业应用,人们使用大型 7080 或小型 1401 IBM 计算机。他们需要十进制和字符串数据类型,以及复杂而高效的输入和输出设施。他们使用 COBOL,尽管在 1963 年初 PL/I 故事开始时,从汇编语言到 COBOL 的转换还远未完成。这类用户也有自己的用户组 GUIDE,并且很少与科学用户联系。
For business applications, people used the large 7080 or the small 1401 IBM computers. They needed the decimal and character string data types, as well as elaborate and efficient input and output facilities. They used COBOL, although in early 1963 when the PL/I story begins, the conversion from assembly language to COBOL was far from complete. This category of users also had its own user group, GUIDE, and seldom had contact with scientific users.
1963 年初,IBM 的规划人员察觉到这种情况开始发生变化。两个相距甚远的计算机用户群体正以一种被认为必定会产生问题的方式相互接近。科学家开始收集大量数据文件进行处理。这些数据需要更复杂、更高效的输入和输出设施。业务应用程序人员开始使用回归分析来构建管理信息系统,这需要浮点数据和数组。计算设备似乎很快就需要两台独立的计算机和技术人员,支持两种截然不同的编程语言。8
In early 1963, IBM planners perceived the beginnings of a change in this situation. The two widely separated computer user groups were moving toward each other in ways that were thought certain to create problems. Scientists began to gather large files of data to be processed. This data required more sophisticated and more efficient input and output facilities. Business applications people began to use regression analysis to build management information systems, which required floating-point data and arrays. It began to appear that computing installations would soon require two separate computers and technical staffs, supporting two very different programming languages.8
这些看法自然而然地导致了设计一台通用计算机的概念,该计算机既能进行浮点运算,又能进行十进制运算,因此既能用于科学应用,又能用于商业应用。IBM System/360 系列计算机的概念由此诞生。随之而来的是,一种既能用于商业应用,又能用于科学应用的编程语言的想法。此外,还加入了支持系统编程和列表处理的功能。因此,新语言将取代 Fortran、COBOL、Lisp 和汇编语言的系统应用。
These perceptions naturally led to the concept of designing a single universal computer that would be capable of doing both floating-point and decimal arithmetic, and therefore both scientific and business applications. Thus was born the concept of the IBM System/360 line of computers. Along with this came the idea of a programming language that could be used for both business and scientific applications. For good measure, features to support systems programming and list processing were thrown in. Therefore, the new language was to replace Fortran, COBOL, Lisp, and the systems applications of assembly language.
设计工作始于 1963 年 10 月,当时 IBM 和 SHARE 成立了 SHARE Fortran 项目的高级语言开发委员会。这个新委员会很快召开会议,并成立了一个名为 3 ⨯ 3 委员会的小组委员会,之所以这样命名,是因为委员会中有三名成员来自 IBM,三名成员来自 SHARE。3 ⨯ 3 委员会每隔一周开会三到四天来设计该语言。
The design effort began when IBM and SHARE formed the Advanced Language Development Committee of the SHARE Fortran Project in October 1963. This new committee quickly met and formed a subcommittee called the 3 ⨯ 3 Committee, so named because it had three members from IBM and three from SHARE. The 3 ⨯ 3 Committee met for three or four days every other week to design the language.
与 COBOL 的短程委员会一样,最初的设计计划在极短的时间内完成。显然,无论语言设计工作的范围如何,在 20 世纪 60 年代初期,人们普遍认为它可以在三个月内完成。PL/I 的第一个版本(当时名为 Fortran VI)应该在 12 月完成,即委员会成立后不到三个月。委员会两次成功地请求延期,将截止日期推迟到 1964 年 1 月,然后推迟到 2 月底。
As with the Short Range Committee for COBOL, the initial design was scheduled for completion in a remarkably short time. Apparently, regardless of the scope of a language design effort, in the early 1960s the prevailing belief was that it could be done in three months. The first version of PL/I, which was then named Fortran VI, was supposed to be completed by December, less than three months after the committee was formed. The committee pleaded successfully on two different occasions for extensions, moving the due date back to January and then to late February 1964.
最初的设计理念是,新语言将是 Fortran IV 的扩展,以保持兼容性,但这个目标很快就和 Fortran VI 这个名字一起被放弃了。直到 1965 年,这种语言被称为 NPL(新编程语言)。第一份关于 NPL 的报告是在 1964 年 3 月的 SHARE 会议上发表的。4 月份提供了更完整的描述,实际实现的版本于 1964 年 12 月(IBM,1964)由被选中负责实现的英国 IBM 赫斯利实验室的编译器小组发布。1965 年,名称更改为 PL/I,以避免 NPL 与英国国家物理实验室混淆。如果编译器是在英国以外开发的,那么名称可能仍然是 NPL。
The initial design concept was that the new language would be an extension of Fortran IV, maintaining compatibility, but that goal was dropped quickly along with the name Fortran VI. Until 1965, the language was known as NPL (New Programming Language). The first published report on NPL was given at the SHARE meeting in March 1964. A more complete description followed in April, and the version that would actually be implemented was published in December 1964 (IBM, 1964) by the compiler group at the IBM Hursley Laboratory in England, which was chosen to do the implementation. In 1965, the name was changed to PL/I to avoid the confusion of the name NPL with the National Physical Laboratory in England. If the compiler had been developed outside the United Kingdom, the name might have remained NPL.
也许对 PL/I 最好的一句话描述是,它包含了当时被认为是 ALGOL 60(递归和块结构)、Fortran IV(通过全局数据进行通信的单独编译)和 COBOL 60(数据结构、输入/输出和报告生成功能)的最佳部分,以及大量以某种方式拼凑在一起的新构造。由于 PL/I 不再是一种流行的语言,我们不会尝试(甚至不会简短地)讨论该语言的所有特性,甚至其最具争议的构造。相反,我们将提到该语言对编程语言知识库的一些贡献。
Perhaps the best single-sentence description of PL/I is that it included what were then considered the best parts of ALGOL 60 (recursion and block structure), Fortran IV (separate compilation with communication through global data), and COBOL 60 (data structures, input/output, and report-generating facilities), along with an extensive collection of new constructs, all somehow cobbled together. Because PL/I is no longer a popular language, we will not attempt, even briefly, to discuss all the features of the language, or even its most controversial constructs. Instead, we will mention some of the language’s contributions to the pool of knowledge of programming languages.
PL/I 是第一个具有以下功能的编程语言:
PL/I was the first programming language to have the following facilities:
程序允许创建并发执行的子程序。虽然这是一个好主意,但在 PL/I 中却发展得不太好。
Programs were allowed to create concurrently executing subprograms. Although this was a good idea, it was poorly developed in PL/I.
可以检测和处理 23 种不同类型的异常或运行时错误。
It was possible to detect and handle 23 different types of exceptions, or run-time errors.
允许子程序递归使用,但可以禁用该功能,从而允许非递归子程序更有效地链接。
Subprograms were allowed to be used recursively, but the capability could be disabled, allowing more efficient linkage for nonrecursive subprograms.
指针被纳入为一种数据类型。
Pointers were included as a data type.
可以引用数组的横截面。例如,可以引用矩阵的第三行,就好像它是一个一维数组一样。
Cross-sections of arrays could be referenced. For example, the third row of a matrix could be referenced as if it were a single-dimensioned array.
对 PL/I 的任何评价都必须首先认识到设计工作的雄心壮志。回想起来,认为如此多的构造可以成功组合似乎有些天真。但是,必须承认当时语言设计经验很少,因此必须慎重判断。总体而言,PL/I 的设计基于这样的前提:任何有用且可以实现的构造都应该包括在内,而没有充分考虑程序员如何理解和有效使用这样的构造和特性集合。Edsger Dijkstra 在他的图灵奖演讲(Dijkstra,1972)中对 PL/I 的复杂性提出了最强烈的批评之一:“我完全看不出我们如何能够将不断增长的程序牢牢地控制在我们的智力控制之下,因为编程语言(请注意,这是我们的基本工具!)的复杂性已经脱离了我们的智力控制。”
Any evaluation of PL/I must begin by recognizing the ambitiousness of the design effort. In retrospect, it appears naive to think that so many constructs could have been combined successfully. However, that judgment must be tempered by acknowledging that there was little language design experience at the time. Overall, the design of PL/I was based on the premise that any construct that was useful and could be implemented should be included, with insufficient concern about how a programmer could understand and make effective use of such a collection of constructs and features. Edsger Dijkstra, in his Turing Award Lecture (Dijkstra, 1972), made one of the strongest criticisms of the complexity of PL/I: “I absolutely fail to see how we can keep our growing programs firmly within our intellectual grip when by its sheer baroqueness the programming language—our basic tool, mind you!—already escapes our intellectual control.”
除了由于规模大而产生的复杂性问题之外,PL/I 还存在许多现在被认为是设计不良的结构。其中包括指针、异常处理和并发性,尽管我们必须指出,在所有情况下,这些结构都没有出现在以前的任何语言中。
In addition to the problem with the complexity due to its large size, PL/I suffered from a number of what are now considered to be poorly designed constructs. Among these were pointers, exception handling, and concurrency, although we must point out that in all cases, these constructs had not appeared in any previous language.
从使用方面来看,PL/I 至少可以说是部分成功的。在 20 世纪 70 年代,它在商业和科学应用中都得到了广泛应用。当时,它还被广泛用作大学教学工具,主要有几种子集形式,例如 PL/C(Cornell,1977 年)和 PL/CS(Conway 和 Constable,1976 年)。
In terms of usage, PL/I must be considered at least a partial success. In the 1970s, it enjoyed significant use in both business and scientific applications. It was also widely used during that time as an instructional vehicle in colleges, primarily in several subset forms, such as PL/C (Cornell, 1977) and PL/CS (Conway and Constable, 1976).
以下是 PL/I 程序的一个示例:
The following is an example of a PL/I program:
/* PL/I PROGRAM EXAMPLE
INPUT: AN INTEGER, LISTLEN, WHERE LISTLEN IS LESS THAN
100, FOLLOWED BY LISTLEN-INTEGER VALUES
OUTPUT: THE NUMBER OF INPUT VALUES THAT ARE GREATER THAN
THE AVERAGE OF ALL INPUT VALUES */
PLIEX: PROCEDURE OPTIONS (MAIN);
DECLARE INTLIST (1:99) FIXED.
DECLARE (LISTLEN, COUNTER, SUM, AVERAGE, RESULT) FIXED;
SUM = 0;
RESULT = 0;
GET LIST (LISTLEN);
IF (LISTLEN > 0) & (LISTLEN < 100) THEN
DO;
/* READ INPUT DATA INTO AN ARRAY AND COMPUTE THE SUM */
DO COUNTER = 1 TO LISTLEN;
GET LIST (INTLIST (COUNTER));
SUM = SUM + INTLIST (COUNTER);
END;
/* COMPUTE THE AVERAGE */
AVERAGE = SUM / LISTLEN;
/* COUNT THE NUMBER OF VALUES THAT ARE > AVERAGE */
DO COUNTER = 1 TO LISTLEN;
IF INTLIST (COUNTER) > AVERAGE THEN
RESULT = RESULT + 1;
END;
/* PRINT RESULT */
PUT SKIP LIST ('THE NUMBER OF VALUES > AVERAGE IS:');
PUT LIST (RESULT);
END;
ELSE
PUT SKIP LIST ('ERROR-INPUT LIST LENGTH IS ILLEGAL');
END PLIEX;
/* PL/I PROGRAM EXAMPLE
INPUT: AN INTEGER, LISTLEN, WHERE LISTLEN IS LESS THAN
100, FOLLOWED BY LISTLEN-INTEGER VALUES
OUTPUT: THE NUMBER OF INPUT VALUES THAT ARE GREATER THAN
THE AVERAGE OF ALL INPUT VALUES */
PLIEX: PROCEDURE OPTIONS (MAIN);
DECLARE INTLIST (1:99) FIXED.
DECLARE (LISTLEN, COUNTER, SUM, AVERAGE, RESULT) FIXED;
SUM = 0;
RESULT = 0;
GET LIST (LISTLEN);
IF (LISTLEN > 0) & (LISTLEN < 100) THEN
DO;
/* READ INPUT DATA INTO AN ARRAY AND COMPUTE THE SUM */
DO COUNTER = 1 TO LISTLEN;
GET LIST (INTLIST (COUNTER));
SUM = SUM + INTLIST (COUNTER);
END;
/* COMPUTE THE AVERAGE */
AVERAGE = SUM / LISTLEN;
/* COUNT THE NUMBER OF VALUES THAT ARE > AVERAGE */
DO COUNTER = 1 TO LISTLEN;
IF INTLIST (COUNTER) > AVERAGE THEN
RESULT = RESULT + 1;
END;
/* PRINT RESULT */
PUT SKIP LIST ('THE NUMBER OF VALUES > AVERAGE IS:');
PUT LIST (RESULT);
END;
ELSE
PUT SKIP LIST ('ERROR-INPUT LIST LENGTH IS ILLEGAL');
END PLIEX;
本节的结构与其他部分不同,因为这里讨论的语言非常不同。APL 和 SNOBOL 对后来的主流语言都没有太大影响。9本书后面讨论了 APL 的一些有趣特性。
The structure of this section is different from that of the other sections because the languages discussed here are very different. Neither APL nor SNOBOL had much influence on later mainstream languages.9 Some of the interesting features of APL are discussed later in the book.
从外观和用途上看,APL 和 SNOBOL 截然不同。但它们有两个基本特征:动态类型和动态存储分配。这两种语言中的变量本质上都是无类型的。变量在被赋值时会获得类型,此时它会假定所赋值的类型。只有当变量被赋值时才会分配存储空间,因为在此之前无法知道所需的存储空间量。
In appearance and in purpose, APL and SNOBOL are quite different. They share two fundamental characteristics, however: dynamic typing and dynamic storage allocation. Variables in both languages are essentially untyped. A variable acquires a type when it is assigned a value, at which time it assumes the type of the value assigned. Storage is allocated to a variable only when it is assigned a value, because before that there is no way to know the amount of storage that will be needed.
APL(Brown 等人,1988 年)由 IBM 的 Kenneth E. Iverson 于 1960 年左右设计。它最初并非设计为一种可实现的编程语言,而是旨在成为描述计算机体系结构的载体。APL 最早出现在其得名之作《一种编程语言》(Iverson,1962 年)一书中。20 世纪 60 年代中期,IBM 开发了 APL 的第一个实现。
APL (Brown et al., 1988) was designed around 1960 by Kenneth E. Iverson at IBM. It was not originally designed to be an implemented programming language but rather was intended to be a vehicle for describing computer architecture. APL was first described in the book from which it gets its name, A Programming Language (Iverson, 1962). In the mid-1960s, the first implementation of APL was developed at IBM.
APL 具有大量强大的运算符,这些运算符是用大量符号指定的,这给实现者带来了问题。最初,APL 是通过 IBM 打印终端使用的。这些终端具有特殊的可选打印球,可提供该语言所需的奇数字符集。APL 具有如此多运算符的一个原因是它提供了大量数组上的单元操作。例如,任何矩阵的转置都用单个运算符完成。大量的运算符集合提供了非常高的表达能力,但也使 APL 程序难以阅读。因此,人们认为 APL 是一种最适合用于“一次性”编程的语言。尽管可以快速编写程序,但由于难以维护,因此在使用后应将其丢弃。
APL has a large number of powerful operators that are specified with a large number of symbols, which created a problem for implementors. Initially, APL was used through IBM printing terminals. These terminals had special optional print balls that provided the odd character set required by the language. One reason APL has so many operators is that it provides a large number of unit operations on arrays. For example, the transpose of any matrix is done with a single operator. The large collection of operators provides very high expressivity but also makes APL programs difficult to read. Therefore, people think of APL as a language that is best used for “throw-away” programming. Although programs can be written quickly, they should be discarded after use because they are difficult to maintain.
APL 已经存在了 55 年多,至今仍在使用,尽管使用范围并不广泛。此外,它自诞生以来并没有发生太大的变化。
APL has been around for over 55 years and is still used today, although not widely. Furthermore, it has not changed a great deal over its lifetime.
SNOBOL(发音为“snowball”;Griswold 等人,1971 年)由贝尔实验室的三个人于 20 世纪 60 年代初设计:DJ Farber、RE Griswold 和 IP Polonsky(Farber 等人,1964 年)。它是专门为文本处理而设计的。SNOBOL 的核心是一组用于字符串模式匹配的强大操作。SNOBOL 的早期应用之一是编写文本编辑器。由于 SNOBOL 的动态特性使其比其他语言慢,因此它不再用于此类程序。但是,SNOBOL 仍然是一种活跃且受支持的语言,可用于多个不同应用领域的各种文本处理任务。
SNOBOL (pronounced “snowball”; Griswold et al., 1971) was designed in the early 1960s by three people at Bell Laboratories: D. J. Farber, R. E. Griswold, and I. P. Polonsky (Farber et al., 1964). It was designed specifically for text processing. The heart of SNOBOL is a collection of powerful operations for string pattern matching. One of the early applications of SNOBOL was for writing text editors. Because the dynamic nature of SNOBOL makes it slower than alternative languages, it is no longer used for such programs. However, SNOBOL is still a live and supported language that is used for a variety of text-processing tasks in several different application areas.
尽管 SIMULA 67 从未得到广泛使用,并且对当时的程序员和计算影响不大,但它引入的一些构造使其具有历史意义。
Although SIMULA 67 never achieved widespread use and had little impact on the programmers and computing of its time, some of the constructs it introduced make it historically important.
1962 年至 1964 年间,两位挪威人 Kristen Nygaard 和 Ole-Johan Dahl 在奥斯陆的挪威计算中心 (NCC) 开发了 SIMULA I 语言。他们主要对使用计算机进行模拟感兴趣,但也从事运筹学工作。SIMULA I 专为系统模拟而设计,并于 1964 年底在 UNIVAC 1107 计算机上首次实现。
Two Norwegians, Kristen Nygaard and Ole-Johan Dahl, developed the language SIMULA I between 1962 and 1964 at the Norwegian Computing Center (NCC) in Oslo. They were primarily interested in using computers for simulation but also worked in operations research. SIMULA I was designed exclusively for system simulation and was first implemented in late 1964 on a UNIVAC 1107 computer.
SIMULA I 实现完成后,Nygaard 和 Dahl 开始努力扩展该语言,添加新功能并修改一些现有结构,以使该语言适用于通用应用程序。这项工作的成果是 SIMULA 67,其设计于 1967 年 3 月首次公开展示(Dahl 和 Nygaard,1967 年)。我们将仅讨论 SIMULA 67,尽管 SIMULA 67 中的一些有趣功能也存在于 SIMULA I 中。
As soon as the SIMULA I implementation was completed, Nygaard and Dahl began efforts to extend the language by adding new features and modifying some existing constructs in order to make the language useful for general-purpose applications. The result of this work was SIMULA 67, whose design was first presented publicly in March 1967 (Dahl and Nygaard, 1967). We will discuss only SIMULA 67, although some of the features of interest in SIMULA 67 are also in SIMULA I.
SIMULA 67 是 ALGOL 60 的扩展,它采用了该语言的块结构和控制语句。ALGOL 60(以及当时的其他语言)在模拟应用方面的主要缺陷在于其子程序的设计。模拟需要允许子程序在之前停止的位置重新启动。具有这种控制的子程序称为协同程序,因为调用者和被调用子程序彼此之间具有某种平等关系,而不是大多数命令式语言中严格的主/从关系。
SIMULA 67 is an extension of ALGOL 60, taking both block structure and the control statements from that language. The primary deficiency of ALGOL 60 (and other languages at that time) for simulation applications was the design of its subprograms. Simulation requires subprograms that are allowed to restart at the position where they previously stopped. Subprograms with this kind of control are known as coroutines because the caller and called subprograms have a somewhat equal relationship with each other, rather than the rigid master/slave relationship they have in most imperative languages.
为了在 SIMULA 67 中提供对协程的支持,开发了类构造。这是一个重要的发展,因为数据抽象的概念始于它,而数据抽象为面向对象编程提供了基础。
To provide support for coroutines in SIMULA 67, the class construct was developed. This was an important development because the concept of data abstraction began with it and data abstraction provides the foundation for object-oriented programming.
有趣的是,数据抽象这一重要概念直到 1972 年才得到发展并归因于类构造,当时Hoare (1972)认识到了这种联系。
It is interesting to note that the important concept of data abstraction was not developed and attributed to the class construct until 1972, when Hoare (1972) recognized the connection.
ALGOL 68 是语言设计中许多新思想的源头,其中一些后来被其他语言采用。尽管它从未在欧洲或美国得到广泛使用,但我们还是将其包括在这里。
ALGOL 68 was the source of several new ideas in language design, some of which were subsequently adopted by other languages. We include it here for that reason, even though it never achieved widespread use in either Europe or the United States.
1962 年 ALGOL 60 修订报告发布后,ALGOL 家族的开发并未结束,尽管六年后才发布下一个设计版本。最终的语言 ALGOL 68(van Wijngaarden 等人,1969 年)与其前身截然不同。
The development of the ALGOL family did not end when the revised report on ALGOL 60 appeared in 1962, although it was six years until the next design iteration was published. The resulting language, ALGOL 68 (van Wijngaarden et al., 1969), was dramatically different from its predecessor.
ALGOL 68 最有趣的创新之一是其主要设计标准之一:正交性。回想一下我们在第1章 中对正交性的讨论。正交性的使用为 ALGOL 68 带来了几个创新特性,其中之一将在下一节中介绍。
One of the most interesting innovations of ALGOL 68 was one of its primary design criteria: orthogonality. Recall our discussion of orthogonality in Chapter 1. The use of orthogonality resulted in several innovative features of ALGOL 68, one of which is described in the following section.
ALGOL 68 正交性的一个重要结果是它包含了用户定义的数据类型。早期的语言(例如 Fortran)仅包含一些基本数据结构。PL/I 包含的数据结构数量更多,这使得它更难学习和实现,但它仍然无法为每种需求提供合适的数据结构。
One important result of orthogonality in ALGOL 68 was its inclusion of user-defined data types. Earlier languages, such as Fortran, included only a few basic data structures. PL/I included a larger number of data structures, which made it harder to learn and difficult to implement, but it still could not provide an appropriate data structure for every need.
ALGOL 68 处理数据结构的方法是提供一些原始类型和结构,并允许用户组合这些原始类型来定义大量不同的结构。这种用户定义数据类型的规定在一定程度上延续到了此后设计的所有主要命令式语言中。用户定义数据类型很有价值,因为它们允许用户设计非常贴合特定问题的数据抽象。第6章 讨论了数据类型的各个方面。
The approach of ALGOL 68 to data structures was to provide a few primitive types and structures and allow the user to combine those primitives to define a large number of different structures. This provision for user-defined data types was carried over to some extent into all of the major imperative languages designed since then. User-defined data types are valuable because they allow the user to design data abstractions that fit particular problems very closely. All aspects of data types are discussed in Chapter 6.
作为数据类型领域的另一项首创,ALGOL 68 引入了一种动态数组,该数组将 在第5章中被称为隐式堆动态。动态数组是在声明中未指定下标边界的数组。对动态数组的赋值会导致分配所需的存储空间。在 ALGOL 68 中,动态数组称为数组。 flex
As another first in the area of data types, ALGOL 68 introduced the kind of dynamic arrays that will be termed implicit heap-dynamic in Chapter 5. A dynamic array is one in which the declaration does not specify subscript bounds. Assignments to a dynamic array cause allocation of required storage. In ALGOL 68, dynamic arrays are called flex arrays.
ALGOL 68 包含大量以前未曾使用过的功能。它对正交性的使用(有些人可能认为有点过度)却是革命性的。
ALGOL 68 includes a significant number of features that had not been previously used. Its use of orthogonality, which some may argue was overdone, was nevertheless revolutionary.
然而,ALGOL 68 重蹈了 ALGOL 60 的覆辙,这也是其流行度有限的一个重要因素。该语言使用一种优雅简洁但又不为人知的元语言来描述。在阅读语言描述文档(van Wijngaarden et al., 1969)之前,必须先学习一种新的元语言,即 van Wijngaarden 语法,它比 BNF 复杂得多。更糟糕的是,设计者发明了一套词汇来解释语法和语言。例如,关键字被称为indicants,子串提取被称为trimming,子程序执行的过程被称为coercion of deproceduring,它可能是meek、firm或其他什么。
ALGOL 68 repeated one of the sins of ALGOL 60, however, and it was an important factor in its limited popularity. The language was described using an elegant and concise but also unknown metalanguage. Before one could read the language-describing document (van Wijngaarden et al., 1969), he or she had to learn the new metalanguage, called van Wijngaarden grammars, which were far more complex than BNF. To make matters worse, the designers invented a collection of words to explain the grammar and the language. For example, keywords were called indicants, substring extraction was called trimming, and the process of a subprogram execution was called a coercion of deproceduring, which might be meek, firm, or something else.
很自然地,我们会将 PL/I 的设计与 ALGOL 68 的设计进行对比,因为它们的出现时间只相隔了几年。ALGOL 68 通过正交性原则实现了可写性:一些原始概念和一些组合机制的不受限制的使用。PL/I 通过包含大量固定结构实现了可写性。ALGOL 68 扩展了 ALGOL 60 的优雅简洁性,而 PL/I 只是将几种语言的功能组合在一起以实现其目标。当然,必须记住,PL/I 的目标是为广泛的问题提供统一的工具,而 ALGOL 68 只针对一个类别:科学应用。
It is natural to contrast the design of PL/I with that of ALGOL 68, because they appeared only a few years apart. ALGOL 68 achieved writability by the principle of orthogonality: a few primitive concepts and the unrestricted use of a few combining mechanisms. PL/I achieved writability by including a large number of fixed constructs. ALGOL 68 extended the elegant simplicity of ALGOL 60, whereas PL/I simply threw together the features of several languages to attain its goals. Of course, it must be remembered that the goal of PL/I was to provide a unified tool for a broad class of problems, whereas ALGOL 68 was targeted to a single class: scientific applications.
PL/I 比 ALGOL 68 获得了更大的认可,这主要归功于 IBM 的推广努力以及理解和实施方面的问题ALGOL 68。实现对两者来说都是一个难题,但 PL/I 拥有 IBM 的资源,可以用于构建编译器。ALGOL 68 却没有这样的恩人。
PL/I achieved far greater acceptance than ALGOL 68, due largely to IBM’s promotional efforts and the problems of understanding and implementing ALGOL 68. Implementation was a difficult problem for both, but PL/I had the resources of IBM to apply to building a compiler. ALGOL 68 enjoyed no such benefactor.
所有命令式语言的部分设计都源自 ALGOL 60 和/或 ALGOL 68。本节讨论了这些语言的一些早期后代。
All imperative languages owe some of their design to ALGOL 60 and/or ALGOL 68. This section discusses some of the early descendants of these languages.
Niklaus Wirth (Wirth 发音为 “Virt”) 是国际信息处理联合会 (IFIP) 2.1 工作组的成员,该工作组成立于 20 世纪 60 年代中期,旨在继续开发 ALGOL。1965 年 8 月,Wirth 和 CAR (“Tony”) Hoare 为这项工作做出了贡献,他们向工作组提交了一份对 ALGOL 60 进行添加和修改的较为温和的提案 ( Wirth and Hoare, 1966 )。工作组中的大多数成员拒绝了该提案,认为它对 ALGOL 60 的改进太小。相反,他们开发了一个更为复杂的修订版,最终成为 ALGOL 68。Wirth 和其他一些工作组成员认为不应该发布 ALGOL 68 报告,因为他们认为该语言以及用于描述它的元语言都很复杂。后来证明,这种观点有一定的道理,因为 ALGOL 68 文档,以及该语言,确实对计算界提出了挑战。
Niklaus Wirth (Wirth is pronounced “Virt”) was a member of the International Federation of Information Processing (IFIP) Working Group 2.1, which was created to continue the development of ALGOL in the mid-1960s. In August 1965, Wirth and C. A. R. (“Tony”) Hoare contributed to that effort by presenting to the group a somewhat modest proposal for additions and modifications to ALGOL 60 (Wirth and Hoare, 1966). The majority of the group rejected the proposal as being too small an improvement over ALGOL 60. Instead, a much more complex revision was developed, which eventually became ALGOL 68. Wirth, along with a few other group members, did not believe that the ALGOL 68 report should have been released, based on the complexity of both the language and the metalanguage used to describe it. This position later proved to have some validity because the ALGOL 68 documents, and therefore the language, were indeed found to be challenging by the computing community.
Wirth 和 Hoare 版本的 ALGOL 60 被称为 ALGOL-W。它在斯坦福大学实现,主要用作教学工具,但仅在少数大学使用。ALGOL-W 的主要贡献是传递参数的值结果方法和用于多项选择的语句。值结果方法是 ALGOL 60 的按名称传递方法的替代方法。这两种方法都在第9章case中讨论。语句在第8章中讨论。 case
The Wirth and Hoare version of ALGOL 60 was named ALGOL-W. It was implemented at Stanford University and was used primarily as an instructional vehicle, but only at a few universities. The primary contributions of ALGOL-W were the value-result method of passing parameters and the case statement for multiple selection. The value-result method is an alternative to ALGOL 60’s pass-by-name method. Both are discussed in Chapter 9. The case statement is discussed in Chapter 8.
Wirth 的下一个主要设计成果也是基于 ALGOL 60,也是他最成功的:Pascal。10 Pascal的最初发布定义出现在 1971 年(Wirth,1971)。此版本在实施过程中进行了一些修改,并在Wirth(1973)中进行了描述。通常归因于 Pascal 的功能实际上来自早期的语言。例如,用户定义的数据类型是在 ALGOL 68 中引入的,语句是case在 ALGOL-W 中引入的,而 Pascal 的记录与 COBOL 和 PL/I 的记录类似。
Wirth’s next major design effort, again based on ALGOL 60, was his most successful: Pascal.10 The original published definition of Pascal appeared in 1971 (Wirth, 1971). This version was modified somewhat in the implementation process and is described in Wirth (1973). The features that are often ascribed to Pascal in fact came from earlier languages. For example, user-defined data types were introduced in ALGOL 68, the case statement in ALGOL-W, and Pascal’s records are similar to those of COBOL and PL/I.
Pascal 最大的影响在于编程教学。1970 年,大多数计算机科学、工程和科学专业的学生都开始学习使用 Fortran 进行编程,尽管一些大学使用 PL/I、基于 PL/I 的语言和 ALGOL-W。到 20 世纪 70 年代中期,Pascal 已成为用于此目的的最广泛使用的语言。这是很自然的,因为 Pascal 是专门为编程教学而设计的。直到 20 世纪 90 年代末,Pascal 才不再是高校中教授编程最常用的语言。
The largest impact of Pascal was on the teaching of programming. In 1970, most students of computer science, engineering, and science were introduced to programming with Fortran, although some universities used PL/I, languages based on PL/I, and ALGOL-W. By the mid-1970s, Pascal had become the most widely used language for this purpose. This was quite natural, because Pascal was designed specifically for teaching programming. It was not until the late 1990s that Pascal was no longer the most commonly used language for teaching programming in colleges and universities.
由于 Pascal 是作为教学语言设计的,因此它缺少许多应用程序所必需的几个功能。最好的例子就是无法编写一个以可变长度数组为参数的子程序。另一个例子是缺乏任何单独的编译功能。这些缺陷自然导致了许多非标准方言的出现,例如 Turbo Pascal。
Because Pascal was designed as a teaching language, it lacks several features that are essential for many kinds of applications. The best example of this is the impossibility of writing a subprogram that takes as a parameter an array of variable length. Another example is the lack of any separate compilation capability. These deficiencies naturally led to many nonstandard dialects, such as Turbo Pascal.
Pascal 在编程教学和其他应用中的流行主要基于其简单性和表现力的完美结合。尽管 Pascal 存在一些不安全性,但它仍然是一种相对安全的语言,尤其是与 Fortran 或 C 相比。到 20 世纪 90 年代中期,Pascal 在工业界和大学中的流行度都在下降,这主要是由于 Modula-2、Ada 和 C++ 的兴起,它们都包含 Pascal 所不具备的功能。
Pascal’s popularity, for both teaching programming and other applications, was based primarily on its remarkable combination of simplicity and expressivity. Although there are some insecurities in Pascal, it is still a relatively safe language, particularly when compared with Fortran or C. By the mid-1990s, the popularity of Pascal was on the decline, both in industry and in universities, primarily due to the rise of Modula-2, Ada, and C++, all of which included features not available in Pascal.
以下是 Pascal 程序的一个例子:
The following is an example of a Pascal program:
{Pascal Example Program
Input: An integer, listlen, where listlen is less than
100, followed by listlen-integer values
Output: The number of input values that are greater than
the average of all input values }
program pasex (input, output);
type intlisttype = array [1..99] of integer;
var
intlist : intlisttype;
listlen, counter, sum, average, result : integer;
begin
result := 0;
sum := 0;
readln (listlen);
if ((listlen > 0) and (listlen < 100)) then
begin
{ Read input into an array and compute the sum }
for counter := 1 to listlen do
begin
readln (intlist[counter]);
sum := sum + intlist[counter]
end;
{ Compute the average }
average := sum / listlen;
{ Count the number of input values that are > average }
for counter := 1 to listlen do
if (intlist[counter] > average) then
result := result + 1;
{ Print the result }
writeln ('The number of values > average is:',
result)
end { of the then clause of if (( listlen > 0 ... }
else
writeln ('Error-input list length is not legal')
end.
{Pascal Example Program
Input: An integer, listlen, where listlen is less than
100, followed by listlen-integer values
Output: The number of input values that are greater than
the average of all input values }
program pasex (input, output);
type intlisttype = array [1..99] of integer;
var
intlist : intlisttype;
listlen, counter, sum, average, result : integer;
begin
result := 0;
sum := 0;
readln (listlen);
if ((listlen > 0) and (listlen < 100)) then
begin
{ Read input into an array and compute the sum }
for counter := 1 to listlen do
begin
readln (intlist[counter]);
sum := sum + intlist[counter]
end;
{ Compute the average }
average := sum / listlen;
{ Count the number of input values that are > average }
for counter := 1 to listlen do
if (intlist[counter] > average) then
result := result + 1;
{ Print the result }
writeln ('The number of values > average is:',
result)
end { of the then clause of if (( listlen > 0 ... }
else
writeln ('Error-input list length is not legal')
end. 与 Pascal 一样,C 语言对之前已知的语言特性贡献不大,但它在很长一段时间内得到了广泛应用。尽管最初设计用于系统编程,但 C 语言非常适合各种应用程序。
Like Pascal, C contributed little to the previously known collection of language features, but it has been widely used over a long period of time. Although originally designed for systems programming, C is well suited for a wide variety of applications.
C 的前身包括 CPL、BCPL、B 和 ALGOL 68。CPL 是在 20 世纪 60 年代初由剑桥大学开发的。BCPL 是一种简单的系统语言,也是在剑桥开发的,这次是由 Martin Richards 于 1967 年开发的(Richards, 1969)。
C’s ancestors include CPL, BCPL, B, and ALGOL 68. CPL was developed at Cambridge University in the early 1960s. BCPL is a simple systems language, also developed at Cambridge, this time by Martin Richards in 1967 (Richards, 1969).
UNIX 操作系统的第一个工作是在 20 世纪 60 年代末由贝尔实验室的 Ken Thompson 完成的。第一个版本是用汇编语言编写的。在 UNIX 下实现的第一个高级语言是基于 BCPL 的 B。B 由 Thompson 于 1970 年设计和实现。
The first work on the UNIX operating system was done in the late 1960s by Ken Thompson at Bell Laboratories. The first version was written in assembly language. The first high-level language implemented under UNIX was B, which was based on BCPL. B was designed and implemented by Thompson in 1970.
BCPL 和 B 都不是类型化语言,这在高级语言中是件奇怪的事情,尽管它们都比 Java 等语言低级得多。无类型意味着所有数据都被视为机器字,这虽然简单,但却会导致许多复杂性和不安全性。例如,在表达式中指定浮点而不是整数算术存在问题。在 BCPL 的一个实现中,浮点运算的变量操作数前面有句点。没有句点的变量操作数被视为整数。另一种方法是使用不同的符号来表示浮点运算符。
Neither BCPL nor B is a typed language, which is an oddity among high-level languages, although both are much lower-level than a language such as Java. Being untyped means that all data are considered machine words, which, although simple, leads to many complications and insecurities. For example, there is the problem of specifying floating-point rather than integer arithmetic in an expression. In one implementation of BCPL, the variable operands of a floating-point operation were preceded by periods. Variable operands not preceded by periods were considered to be integers. An alternative to this would have been to use different symbols for the floating-point operators.
这个问题以及其他几个问题导致了一种基于 B 的新类型语言的开发。该语言最初称为 NB,后来更名为 C,由 Dennis Ritchie 于 1972 年在贝尔实验室设计和实现(Kernighan and Ritchie,1978 年)。C 在某些情况下通过 BCPL 受到 ALGOL 68 的影响,而在其他情况下则直接受到 ALGOL 68 的影响。这体现在其 forandswitch语句、赋值运算符以及指针处理中。
This problem, along with several others, led to the development of a new typed language based on B. Originally called NB but later named C, it was designed and implemented by Dennis Ritchie at Bell Laboratories in 1972 (Kernighan and Ritchie, 1978). In some cases through BCPL, and in other cases directly, C was influenced by ALGOL 68. This is seen in its for and switch statements, in its assigning operators, and in its treatment of pointers.
在 C 语言诞生后的十五年中,它的唯一“标准”是Kernighan 和 Ritchie 在 1978 年出版的书籍。11在这段时间内,C 语言慢慢演变,不同的实现者添加了不同的特性。1989 年,ANSI 发布了 C 语言的官方描述 ( ANSI, 1989 ),其中包括实现者已经融入该语言的许多特性。此标准于 1999 年更新 ( ISO, 1999 )。后来的版本对 C 语言进行了一些重大更改,其中包括复杂数据类型、布尔数据类型和 C++ 风格的注释 ( //)。我们将 1989 年版本(长期以来称为 ANSI C)称为 C89;将 1999 年版本称为 C99。
The only “standard” for C in its first decade and a half was the book by Kernighan and Ritchie (1978).11 Over that time span, the language slowly evolved, with different implementors adding different features. In 1989, ANSI produced an official description of C (ANSI, 1989), which included many of the features that implementors had already incorporated into the language. This standard was updated in 1999 (ISO, 1999). This later version includes a few significant changes to the language. Among these are a complex data type, a Boolean data type, and C++-style comments (//). We will refer to the 1989 version, which has long been called ANSI C, as C89; we will refer to the 1999 version as C99.
C 具有足够的控制语句和数据结构功能,可在许多应用领域中使用。它还具有一组丰富的运算符,可提供高度的表现力。
C has adequate control statements and data-structuring facilities to allow its use in many application areas. It also has a rich set of operators that provide a high degree of expressiveness.
C 语言既受人喜爱又遭人厌恶的一个最重要原因是它缺乏完整的类型检查。例如,在 C99 之前的版本中,可以编写不进行类型检查的函数。喜欢 C 语言的人欣赏它的灵活性;不喜欢 C 语言的人则认为它太不安全。C 语言在 20 世纪 80 年代大受欢迎的主要原因是它的编译器是广泛使用的 UNIX 操作系统的一部分。UNIX 操作系统的这一功能提供了一个基本上免费且相当不错的编译器,可供各种不同类型计算机上的程序员使用。
One of the most important reasons why C is both liked and disliked is its lack of complete type checking. For example, in versions before C99, functions could be written for which parameters were not type checked. Those who like C appreciate the flexibility; those who do not like it find it too insecure. A major reason for its great increase in popularity in the 1980s was that a compiler for it was part of the widely used UNIX operating system. This inclusion in UNIX provided an essentially free and quite good compiler that was available to programmers on many different kinds of computers.
以下是 C 程序的示例:
The following is an example of a C program:
/* C Example Program
Input: An integer, listlen, where listlen is less than
100, followed by listlen-integer values
Output: The number of input values that are greater than
the average of all input values */
int main (){
int intlist[99], listlen, counter, sum, average, result;
sum = 0;
result = 0;
scanf("%d", &listlen);
if ((listlen > 0) && (listlen < 100)) {
/* Read input into an array and compute the sum */
for (counter = 0; counter < listlen; counter++) {
scanf("%d", &intlist[counter]);
sum += intlist[counter];
}
/* Compute the average */
average = sum / listlen;
/* Count the input values that are > average */
for (counter = 0; counter < listlen; counter++)
if (intlist[counter] > average) result++;
/* Print result */
printf("Number of values > average is:%d\n", result);
}
else
printf("Error-input list length is not legal\n");
}
/* C Example Program
Input: An integer, listlen, where listlen is less than
100, followed by listlen-integer values
Output: The number of input values that are greater than
the average of all input values */
int main (){
int intlist[99], listlen, counter, sum, average, result;
sum = 0;
result = 0;
scanf("%d", &listlen);
if ((listlen > 0) && (listlen < 100)) {
/* Read input into an array and compute the sum */
for (counter = 0; counter < listlen; counter++) {
scanf("%d", &intlist[counter]);
sum += intlist[counter];
}
/* Compute the average */
average = sum / listlen;
/* Count the input values that are > average */
for (counter = 0; counter < listlen; counter++)
if (intlist[counter] > average) result++;
/* Print result */
printf("Number of values > average is:%d\n", result);
}
else
printf("Error-input list length is not legal\n");
}简单来说,逻辑编程就是使用形式逻辑符号将计算过程传达给计算机。谓词演算就是当前逻辑编程语言中使用的符号。
Simply put, logic programming is the use of a formal logic notation to communicate computational processes to a computer. Predicate calculus is the notation used in current logic programming languages.
逻辑编程语言中的编程是非过程化的。此类语言中的程序不会准确说明如何计算结果,而是描述结果的必要形式和/或特征。在逻辑编程语言中提供此功能需要一种简洁的方法,为计算机提供相关信息和计算所需结果的推理过程。谓词演算为计算机提供了基本的通信形式,而由Robinson (1965)首次开发的证明方法(称为解析)提供了推理技术。
Programming in logic programming languages is nonprocedural. Programs in such languages do not state exactly how a result is to be computed but rather describe the necessary form and/or characteristics of the result. What is needed to provide this capability in logic programming languages is a concise means of supplying the computer with both the relevant information and an inferencing process for computing desired results. Predicate calculus supplies the basic form of communication to the computer, and the proof method, named resolution, developed first by Robinson (1965), supplies the inferencing technique.
20 世纪 70 年代初,艾克斯-马赛大学人工智能小组的 Alain Colmerauer 和 Phillippe Roussel 与爱丁堡大学人工智能系的 Robert Kowalski 共同开发了 Prolog 的基本设计。Prolog 的主要组成部分是指定谓词演算命题的方法和受限形式的解析的实现。第16章 介绍了谓词演算和解析。第一个 Prolog 解释器于 1972 年在马赛开发。所实现的语言版本在Roussel (1975)中进行了描述。Prolog 这个名字来自编程逻辑。
During the early 1970s, Alain Colmerauer and Phillippe Roussel in the Artificial Intelligence Group at the University of Aix-Marseille, together with Robert Kowalski of the Department of Artificial Intelligence at the University of Edinburgh, developed the fundamental design of Prolog. The primary components of Prolog are a method for specifying predicate calculus propositions and an implementation of a restricted form of resolution. Both predicate calculus and resolution are described in Chapter 16. The first Prolog interpreter was developed at Marseille in 1972. The version of the language that was implemented is described in Roussel (1975). The name Prolog is from programming logic.
Prolog 程序由语句集合组成。Prolog 只有几种语句,但它们可能很复杂。
Prolog programs consist of collections of statements. Prolog has only a few kinds of statements, but they can be complex.
Prolog 的一个常见用途是作为一种智能数据库。此应用程序为讨论 Prolog 语言提供了一个简单的框架。
One common use of Prolog is as a kind of intelligent database. This application provides a simple framework for discussing the Prolog language.
Prolog 程序的数据库由两种类型的语句组成:事实和规则。以下是事实语句的示例:
The database of a Prolog program consists of two kinds of statements: facts and rules. The following are examples of fact statements:
mother(joanne, jake).
father(vern, joanne).
mother(joanne, jake).
father(vern, joanne).
这些表明joanne是mother的jake,并且vern是father的joanne。
These state that joanne is the mother of jake, and vern is the father of joanne.
规则语句的示例为
An example of a rule statement is
grandparent(X, Z) :- parent(X, Y), parent(Y, Z).grandparent(X, Z) :- parent(X, Y), parent(Y, Z).
这表明,如果 是的且是的父级,X则可推断 是grandparent的,对于变量 、 和 的某些特定值。ZXparentYYZXYZ
This states that it can be deduced that X is the grandparent of Z if it is true that X is the parent of Y and Y is the parent of Z, for some specific values for the variables X, Y, and Z.
Prolog 数据库可以用目标语句进行交互式查询,例如
The Prolog database can be interactively queried with goal statements, an example of which is
father(bob, darcie).father(bob, darcie).
这询问 是否bob是father的darcie。当这样的查询或目标呈现给 Prolog 系统时,它会使用其解析过程来尝试确定语句的真实性。如果它可以得出结论,即目标为真,则显示“真”。如果它无法证明这一点,则显示“假”。
This asks if bob is the father of darcie. When such a query, or goal, is presented to the Prolog system, it uses its resolution process to attempt to determine the truth of the statement. If it can conclude that the goal is true, it displays “true.” If it cannot prove it, it displays “false.”
20 世纪 80 年代,有一小部分计算机科学家认为,逻辑编程是摆脱命令式语言复杂性以及开发大量可靠软件的巨大问题的最佳希望。然而,到目前为止,逻辑编程尚未得到广泛应用的主要原因有两个。首先,与其他一些非命令式方法一样,迄今为止,用逻辑语言编写的程序已被证明相对于等效的命令式程序效率极低。其次,人们已经确定,它只对少数几个相对较小的应用领域有效:某些类型的数据库管理系统和某些 AI 领域。
In the 1980s, there was a relatively small group of computer scientists who believed that logic programming provided the best hope for escaping from the complexity of imperative languages, and also from the enormous problem of producing the large amount of reliable software that was needed. So far, however, there are two major reasons why logic programming has not become more widely used. First, as with some other nonimperative approaches, programs written in logic languages thus far have proven to be highly inefficient relative to equivalent imperative programs. Second, it has been determined that it is an effective approach for only a few relatively small areas of application: certain kinds of database management systems and some areas of AI.
Prolog 有一种支持面向对象编程的方言:Prolog++(Moss,1994 )。第16章 将更详细地介绍逻辑编程和 Prolog 。
There is a dialect of Prolog that supports object-oriented programming: Prolog++ (Moss, 1994). Logic programming and Prolog are described in greater detail in Chapter 16.
Ada 语言是有史以来最广泛、最昂贵的语言设计成果。以下段落简要描述了 Ada 的演变。
The Ada language is the result of the most extensive and expensive language design effort ever undertaken. The following paragraphs briefly describe the evolution of Ada.
Ada 语言是为国防部 (DoD) 开发的,因此其计算环境的状态对于确定其形式至关重要。到 1974 年,国防部一半以上的计算机应用程序都是嵌入式系统。嵌入式系统是指将计算机硬件嵌入到它控制或提供服务的设备中的系统。软件成本迅速上升,主要是因为系统越来越复杂。国防部项目使用了 450 多种不同的编程语言,但没有一种被国防部标准化。每个国防承包商都可以为每个合同定义一种新的和不同的语言。12由于语言的激增,应用软件很少被重复使用。此外,没有创建软件开发工具(因为它们通常依赖于语言)。使用了很多语言,但没有一种真正适合嵌入式系统应用程序。出于这些原因,1974 年,陆军、海军和空军各自独立提出开发一种用于嵌入式系统的高级语言。
The Ada language was developed for the Department of Defense (DoD), so the state of their computing environment was instrumental in determining its form. By 1974, over half of the applications of computers in DoD were embedded systems. An embedded system is one in which the computer hardware is embedded in the device it controls or for which it provides services. Software costs were rising rapidly, primarily because of the increasing complexity of systems. More than 450 different programming languages were in use for DoD projects, and none of them was standardized by DoD. Every defense contractor could define a new and different language for every contract.12 Because of this language proliferation, application software was rarely reused. Furthermore, no software development tools were created (because they are usually language dependent). A great many languages were in use, but none was actually suitable for embedded systems applications. For these reasons, in 1974, the Army, Navy, and Air Force each independently proposed the development of a single high-level language for embedded systems.
注意到这一广泛的兴趣,国防研究与工程部主任马尔科姆·柯里于 1975 年 1 月成立了高阶语言工作组 (HOLWG),最初由空军中校威廉·惠特克领导。HOLWG 的代表来自所有军种,并与英国、法国和当时的西德保持联络。其最初的章程是做以下事情:
Noting this widespread interest, in January 1975, Malcolm Currie, director of Defense Research and Engineering, formed the High-Order Language Working Group (HOLWG), initially headed by Lt. Col. William Whitaker of the Air Force. The HOLWG had representatives from all of the military services and liaisons with Great Britain, France, and what was then West Germany. Its initial charter was to do the following:
确定新国防部高级语言的要求。
Identify the requirements for a new DoD high-level language.
评估现有语言以确定是否存在可行的候选者。
Evaluate existing languages to determine whether there was a viable candidate.
建议采用或实施一组最少的编程语言。
Recommend adoption or implementation of a minimal set of programming languages.
1975 年 4 月,HOLWG 为新语言制作了 Strawman 需求文档(国防部,1975a)。该文档分发给各军种、联邦机构、选定的工业和大学代表以及欧洲的相关方。
In April 1975, the HOLWG produced the Strawman requirements document for the new language (Department of Defense, 1975a). This was distributed to military branches, federal agencies, selected industrial and university representatives, and interested parties in Europe.
继 Strawman 文件之后,1975 年 8 月又发布了 Woodenman 文件(国防部,1975b ), 1976 年 1 月又发布了Tinman 文件(国防部,1976 ),1977 年 1 月又发布了 Ironman 文件(国防部,1977),最后在 1978 年 6 月又发布了 Steelman 文件(国防部,1978 )。
The Strawman document was followed by Woodenman (Department of Defense, 1975b) in August 1975, Tinman (Department of Defense, 1976) in January 1976, Ironman (Department of Defense, 1977) in January 1977, and finally Steelman (Department of Defense, 1978) in June 1978.
经过漫长的选拔过程,众多提交的语言方案最终缩小到四个入围者,所有方案均基于 Pascal。1979 年 5 月,Cii Honeywell/Bull 语言设计方案从四个入围者中脱颖而出,成为最终的设计方案。Cii Honeywell/Bull 设计团队来自法国,是四支队伍中唯一的外国参赛者,由 Jean Ichbiah 领导。
After a tedious process, the many submitted proposals for the language were narrowed down to four finalists, all of which were based on Pascal. In May 1979, the Cii Honeywell/Bull language design proposal was chosen from the four finalists as the design that would be used. The Cii Honeywell/Bull design team in France, the only foreign competitor among the final four, was led by Jean Ichbiah.
1979 年春,海军物资司令部的杰克·库珀 (Jack Cooper) 建议将新语言命名为 Ada,该名称随后被采纳。该名称是为了纪念奥古斯塔·艾达·拜伦 (Augusta Ada Byron,1815-1851 年),她是洛夫莱斯伯爵夫人、数学家,也是诗人拜伦勋爵的女儿。她被公认为世界上第一位程序员。她与查尔斯·巴贝奇合作开发了他的机械计算机、差分机和分析机,为多个数值过程编写了程序。
In the spring of 1979, Jack Cooper of the Navy Materiel Command recommended the name for the new language, Ada, which was then adopted. The name commemorates Augusta Ada Byron (1815–1851), countess of Lovelace, mathematician, and daughter of poet Lord Byron. She is generally recognized as being the world’s first programmer. She worked with Charles Babbage on his mechanical computers, the Difference and Analytical Engines, writing programs for several numerical processes.
ACM 在SIGPLAN 通告( ACM, 1979 )中公布了 Ada 的设计和基本原理,并分发给了 10,000 多名读者。1979 年 10 月,在波士顿举行了一次公开测试和评估会议,来自美国和欧洲的 100 多个组织的代表参加了会议。截至 11 月,已收到来自 15 个不同国家的 500 多份语言报告。大多数报告建议进行小幅修改,而不是进行大幅更改和彻底拒绝。基于语言报告,需求规范的下一个版本,即 Stoneman 文档 ( Department of Defense, 1980 ),于 1980 年 2 月发布。
The design and the rationale for Ada were published by ACM in its SIGPLAN Notices (ACM, 1979) and distributed to a readership of more than 10,000 people. A public test and evaluation conference was held in October 1979 in Boston, with representatives from over 100 organizations from the United States and Europe. By November, more than 500 language reports had been received from 15 different countries. Most of the reports suggested small modifications rather than drastic changes and outright rejections. Based on the language reports, the next version of the requirements specification, the Stoneman document (Department of Defense, 1980), was released in February 1980.
1980 年 7 月,语言设计修订版完成,并被接受为 MIL-STD 1815,即标准Ada 语言参考手册。之所以选择 1815 这个数字,是因为这是 Augusta Ada Byron 出生的年份。Ada语言参考手册的另一个修订版于 1982 年 7 月发布。1983 年,美国国家标准协会对 Ada 进行了标准化。Goos和 Hartmanis (1983)描述了这个“最终”官方版本。Ada 语言设计随后被冻结了至少五年。
A revised version of the language design was completed in July 1980 and was accepted as MIL-STD 1815, the standard Ada Language Reference Manual. The number 1815 was chosen because it was the year of the birth of Augusta Ada Byron. Another revised version of the Ada Language Reference Manual was released in July 1982. In 1983, the American National Standards Institute standardized Ada. This “final” official version is described in Goos and Hartmanis (1983). The Ada language design was then frozen for a minimum of five years.
本小节简要描述了 Ada 语言的四个主要贡献。
This subsection briefly describes four of the major contributions of the Ada language.
Ada 语言中的包提供了封装数据对象、数据类型规范和过程的方法。这反过来又为在程序设计中使用数据抽象提供了支持。
Packages in the Ada language provide the means for encapsulating data objects, specifications for data types, and procedures. This, in turn, provides the support for the use of data abstraction in program design.
Ada 语言包含大量异常处理功能,这使得程序员在检测到各种异常或运行时错误后能够获得控制权。
The Ada language includes extensive facilities for exception handling, which allow the programmer to gain control after any one of a wide variety of exceptions, or run-time errors, has been detected.
在 Ada 中,程序单元可以是通用的。例如,可以编写一个排序过程,使用未指定的类型对要排序的数据进行排序。这种通用过程必须先针对指定类型进行实例化,然后才能使用,这可以通过一条语句来完成,该语句使编译器生成具有给定类型的过程版本。这种通用单元的可用性增加了程序员可以重用(而不是重复)的程序单元的范围。
Program units can be generic in Ada. For example, it is possible to write a sort procedure that uses an unspecified type for the data to be sorted. Such a generic procedure must be instantiated for a specified type before it can be used, which is done with a statement that causes the compiler to generate a version of the procedure with the given type. The availability of such generic units increases the range of program units that might be reused, rather than duplicated, by programmers.
Ada 语言还利用会合机制支持并发执行特殊程序单元(称为任务)。会合是一种任务间通信和同步方法的名称。
The Ada language also provides for concurrent execution of special program units, named tasks, using the rendezvous mechanism. Rendezvous is the name of a method of intertask communication and synchronization.
在设计 Ada 语言时需要考虑的最重要的方面可能是:
Perhaps the most important aspects of the design of the Ada language to consider are the following:
由于设计具有竞争性,因此参与没有限制。
Because the design was competitive, there were no limits on participation.
Ada 语言体现了 20 世纪 70 年代末软件工程和语言设计的大部分概念。尽管有人可能会质疑整合这些功能的实际方法,以及在一种语言中包含如此多功能是否明智,但大多数人都认为这些功能很有价值。
The Ada language embodies most of the concepts of software engineering and language design of the late 1970s. Although one can question the actual approaches used to incorporate these features, as well as the wisdom of including such a large number of features in a language, most agree that the features are valuable.
虽然大多数人没有预料到,但开发 Ada 语言的编译器是一项艰巨的任务。直到 1985 年,即语言设计完成近四年后,才开始出现真正可用的 Ada 编译器。
Although most people did not anticipate it, the development of a compiler for the Ada language was a difficult task. Only in 1985, almost four years after the language design was completed, did truly usable Ada compilers begin to appear.
在 Ada 问世的最初几年,人们最严厉的批评就是它太过庞大和复杂。特别是,Hoare (1981)指出,它不适用于任何对可靠性要求很高的应用程序,而 Ada 正是为这类应用程序而设计的。另一方面,其他人称赞它是当时语言设计的典范。事实上,就连 Hoare 最终也软化了他对 Ada 的看法。
The most serious criticism of Ada in its first few years was that it was too large and too complex. In particular, Hoare (1981) stated that it should not be used for any application where reliability is critical, which is precisely the type of application for which it was designed. On the other hand, others have praised it as the epitome of language design for its time. In fact, even Hoare eventually softened his view of the language.
以下是 Ada 程序的示例:
The following is an example of an Ada program:
-- Ada Example Program
-- Input: An integer, List_Len, where List_Len is less
-- than 100, followed by List_Len-integer values
-- Output: The number of input values that are greater
-- than the average of all input values
with Ada.Text_IO, Ada.Integer.Text_IO;
use Ada.Text_IO, Ada.Integer.Text_IO;
procedure Ada_Ex is
type Int_List_Type is array (1..99) of Integer;
Int_List : Int_List_Type;
List_Len, Sum, Average, Result : Integer;
begin
Result:= 0;
Sum := 0;
Get (List_Len);
if (List_Len > 0) and (List_Len < 100) then
-- Read input data into an array and compute the sum
for Counter := 1 .. List_Len loop
Get (Int_List(Counter));
Sum := Sum + Int_List(Counter);
end loop;
-- Compute the average
Average := Sum / List_Len;
-- Count the number of values that are > average
for Counter := 1 .. List_Len loop
if Int_List(Counter) > Average then
Result:= Result+ 1;
end if;
end loop;
-- Print result
Put ("The number of values > average is:");
Put (Result);
New_Line;
else
Put_Line ("Error-input list length is not legal");
end if;
end Ada_Ex;
-- Ada Example Program
-- Input: An integer, List_Len, where List_Len is less
-- than 100, followed by List_Len-integer values
-- Output: The number of input values that are greater
-- than the average of all input values
with Ada.Text_IO, Ada.Integer.Text_IO;
use Ada.Text_IO, Ada.Integer.Text_IO;
procedure Ada_Ex is
type Int_List_Type is array (1..99) of Integer;
Int_List : Int_List_Type;
List_Len, Sum, Average, Result : Integer;
begin
Result:= 0;
Sum := 0;
Get (List_Len);
if (List_Len > 0) and (List_Len < 100) then
-- Read input data into an array and compute the sum
for Counter := 1 .. List_Len loop
Get (Int_List(Counter));
Sum := Sum + Int_List(Counter);
end loop;
-- Compute the average
Average := Sum / List_Len;
-- Count the number of values that are > average
for Counter := 1 .. List_Len loop
if Int_List(Counter) > Average then
Result:= Result+ 1;
end if;
end loop;
-- Print result
Put ("The number of values > average is:");
Put (Result);
New_Line;
else
Put_Line ("Error-input list length is not legal");
end if;
end Ada_Ex;以下段落简要介绍了 Ada 95 的两个最重要的新功能。在本书的其余部分,当需要区分两个版本时,我们将使用名称 Ada 83 表示原始版本,使用 Ada 95(其实际名称)表示后续版本。在讨论两个版本共有的语言功能时,我们将使用名称 Ada。Ada 95 标准语言在ARM(1995)中定义。
Two of the most important new features of Ada 95 are described briefly in the following paragraphs. In the remainder of the book, we will use the name Ada 83 for the original version and Ada 95 (its actual name) for the later version when it is important to distinguish between the two versions. In discussions of language features common to both versions, we will use the name Ada. The Ada 95 standard language is defined in ARM (1995).
Ada 83 的类型派生机制在 Ada 95 中得到扩展,允许向从基类继承的组件添加新组件。这提供了继承,这是面向对象编程语言的关键要素。子程序调用与子程序定义的动态绑定是通过子程序调度实现的,子程序调度基于通过类范围类型的派生类型的标签值。此功能提供了多态性,这是面向对象编程的另一个主要功能。
The type derivation mechanism of Ada 83 is extended in Ada 95 to allow adding new components to those inherited from a base class. This provides for inheritance, a key ingredient in object-oriented programming languages. Dynamic binding of subprogram calls to subprogram definitions is accomplished through subprogram dispatching, which is based on the tag value of derived types through classwide types. This feature provides for polymorphism, another principal feature of object-oriented programming.
Ada 83 的会合机制仅提供了一种在并发进程之间共享数据的繁琐且低效的方法。有必要引入一项新任务来控制对共享数据的访问。Ada 95 的受保护对象提供了一种有吸引力的替代方案。共享数据封装在一个语法结构中,该结构控制对数据的所有访问,无论是通过会合还是通过子程序调用。
The rendezvous mechanism of Ada 83 provided only a cumbersome and inefficient means of sharing data among concurrent processes. It was necessary to introduce a new task to control access to the shared data. The protected objects of Ada 95 offer an attractive alternative to this. The shared data is encapsulated in a syntactic structure that controls all access to the data, either by rendezvous or by subprogram call.
人们普遍认为,Ada 95 的普及度下降是因为美国国防部不再要求在军事软件系统中使用 Ada 95。当然,还有其他因素阻碍了它的普及度。其中最重要的是 C++ 在面向对象编程方面的广泛接受,而这发生在 Ada 95 发布之前。
It is widely believed that the popularity of Ada 95 suffered because the Department of Defense stopped requiring its use in military software systems. There were, of course, other factors that hindered its growth in popularity. Most important among these was the widespread acceptance of C++ for object-oriented programming, which occurred before Ada 95 was released.
Ada 2005 对 Ada 95 进行了一些添加。其中包括与 Java 类似的接口、对调度算法的更多控制以及同步接口。
There were several additions to Ada 95 to get Ada 2005. Among these were interfaces, similar to those of Java, more control of scheduling algorithms, and synchronized interfaces.
Ada 广泛应用于商用和国防航空电子、空中交通管制、铁路运输以及其他领域。
Ada is widely used in both commercial and defense avionics, air traffic control, and rail transportation, as well as in other areas.
Smalltalk 是第一个完全支持面向对象编程的编程语言。因此,它是讨论编程语言演进的重要部分。
Smalltalk was the first programming language that fully supported object-oriented programming. It is therefore an important part of any discussion of the evolution of programming languages.
导致 Smalltalk 开发的概念源自 20 世纪 60 年代末犹他大学 Alan Kay 的博士论文(Kay, 1969)。Kay 具有非凡的远见,预测了未来将出现功能强大的台式计算机。回想一下,第一个微型计算机系统直到 20 世纪 70 年代中期才上市,它们与 Kay 设想的机器只有一点点关系,这些机器每秒可以执行一百万条或更多的指令,并包含几兆字节的内存。这种以工作站形式出现的机器直到 20 世纪 80 年代初才广泛普及。
The concepts that led to the development of Smalltalk originated in the Ph.D. dissertation work of Alan Kay in the late 1960s at the University of Utah (Kay, 1969). Kay had the remarkable foresight to predict the future availability of powerful desktop computers. Recall that the first microcomputer systems were not marketed until the mid-1970s, and they were only remotely related to the machines envisioned by Kay, which were seen to execute a million or more instructions per second and contain several megabytes of memory. Such machines, in the form of workstations, became widely available only in the early 1980s.
Kay 认为台式计算机将由非程序员使用,因此需要非常强大的人机界面功能。20 世纪 60 年代后期的计算机主要是面向批处理的,仅供专业程序员和科学家使用。Kay 认为,要供非程序员使用,计算机必须具有高度交互性,并在其用户界面中使用复杂的图形。一些图形概念来自 Seymour Papert 的 LOGO 经验,其中图形用于帮助儿童使用计算机(Papert,1980 年)。
Kay believed that desktop computers would be used by nonprogrammers and thus would need very powerful human-interfacing capabilities. The computers of the late 1960s were largely batch oriented and were used exclusively by professional programmers and scientists. For use by nonprogrammers, Kay determined, a computer would have to be highly interactive and use sophisticated graphics in its user interface. Some of the graphics concepts came from the LOGO experience of Seymour Papert, in which graphics were used to aid children in the use of computers (Papert, 1980).
Kay 最初设想了一个他称之为 Dynabook 的系统,该系统旨在成为通用信息处理器。该系统部分基于他曾参与设计的 Flex 语言。Flex 主要基于 SIMULA 67。Dynabook 使用了典型办公桌的范例,办公桌上有许多纸张,有些纸张被部分覆盖。最上面的纸张通常是关注的焦点,其他纸张则暂时失焦。Dynabook 的显示屏将模拟这一场景,使用屏幕窗口来表示桌面上的各种纸张。用户可以通过击键和用手指触摸屏幕与这样的显示屏进行交互。在 Dynabook 的初步设计为他赢得了博士学位后,Kay 的目标变成了看到这样一台机器的建造。
Kay originally envisioned a system he called Dynabook, which was meant to be a general information processor. It was based in part on the Flex language, which he had helped design. Flex was based primarily on SIMULA 67. Dynabook used the paradigm of the typical desk, on which there are a number of papers, some partially covered. The top sheet is often the focus of attention, with the others temporarily out of focus. The display of Dynabook would model this scene, using screen windows to represent various sheets of paper on the desktop. The user would interact with such a display both through keystrokes and by touching the screen with his or her fingers. After the preliminary design of Dynabook earned him a Ph.D., Kay’s goal became to see such a machine constructed.
Kay 进入了施乐帕洛阿尔托研究中心 (Xerox PARC),并提出了关于 Dynabook 的想法。这使他在那里就业,并随后在施乐成立了学习研究小组。该小组的第一项任务是设计一种语言来支持 Kay 的编程范式,并将其实现在当时最好的个人计算机上。这些努力的结果是产生了一种“临时” Dynabook,由施乐 Alto 工作站和 Smalltalk-72 软件组成。它们共同构成了一个用于进一步开发的研究工具。使用该系统进行了许多研究项目,包括几项向儿童教授编程的实验。随着实验的进行,计算机得到了进一步的发展,产生了一系列语言,以 Smalltalk-80 结尾。随着语言的发展,它所依赖的硬件的功能也随之增强。到 1980 年,该语言和施乐硬件都几乎符合 Alan Kay 早期的愿景。
Kay found his way to the Xerox Palo Alto Research Center (Xerox PARC) and presented his ideas on Dynabook. This led to his employment there and the subsequent birth of the Learning Research Group at Xerox. The first charge of the group was to design a language to support Kay’s programming paradigm and implement it on the best personal computer then available. These efforts resulted in an “Interim” Dynabook, consisting of a Xerox Alto workstation and Smalltalk-72 software. Together, they formed a research tool for further development. A number of research projects were conducted with this system, including several experiments to teach programming to children. Along with the experiments came further developments, leading to a sequence of languages that ended with Smalltalk-80. As the language grew, so did the power of the hardware on which it resided. By 1980, both the language and the Xerox hardware nearly matched the early vision of Alan Kay.
Smalltalk 世界只由对象组成,从整数常量到大型复杂软件系统。 Smalltalk 中的所有计算都采用相同的统一技术:向对象发送消息以调用其方法之一。对消息的回复是一个对象,它要么返回请求的信息,要么只是通知发送者请求的处理已完成。消息和子程序调用之间的根本区别在于:消息被发送到数据对象,特别是为该对象定义的方法之一。然后执行被调用的方法,通常会修改消息发送到的对象的数据;子程序调用是向子程序代码发送的消息。通常,子程序要处理的数据作为参数发送给它。13
The Smalltalk world is populated by nothing but objects, from integer constants to large complex software systems. All computing in Smalltalk is done by the same uniform technique: sending a message to an object to invoke one of its methods. A reply to a message is an object, which either returns the requested information or simply notifies the sender that the requested processing has been completed. The fundamental difference between a message and a subprogram call is this: A message is sent to a data object, specifically to one of the methods defined for the object. The called method is then executed, often modifying the data of the object to which the message was sent; a subprogram call is a message to the code of a subprogram. Usually the data to be processed by the subprogram is sent to it as a parameter.13
在 Smalltalk 中,对象抽象是类,它们与 SIMULA 67 的类非常相似。可以创建类的实例,然后将其作为程序的对象。
In Smalltalk, object abstractions are classes, which are very similar to the classes of SIMULA 67. Instances of the class can be created and are then the objects of the program.
Smalltalk 的语法与大多数其他编程语言的语法不同,很大程度上是因为使用消息,而不是算术和逻辑表达式以及常规控制语句。下一小节的示例说明了 Smalltalk 控制结构之一。
The syntax of Smalltalk is unlike that of most other programming language, in large part because of the use of messages, rather than arithmetic and logic expressions and conventional control statements. One of the Smalltalk control constructs is illustrated in the example in the next subsection.
Smalltalk 为推动计算的两个不同方面做出了巨大贡献:图形用户界面和面向对象编程。窗口系统现在是用户界面的主导方法。软件系统的设计源于 Smalltalk。如今,最重要的软件设计方法和编程语言都是面向对象的。尽管面向对象语言的一些思想起源于 SIMULA 67,但它们在 Smalltalk 中达到了成熟。显然,Smalltalk 对计算世界的影响是广泛的,并将持续很长时间。
Smalltalk has done a great deal to promote two separate aspects of computing: graphical user interfaces and object-oriented programming. The windowing systems that are now the dominant method of user interfaces to software systems grew out of Smalltalk. Today, the most significant software design methodologies and programming languages are object oriented. Although the origin of some of the ideas of object-oriented languages came from SIMULA 67, they reached maturation in Smalltalk. It is clear that Smalltalk’s impact on the computing world is extensive and will be long-lived.
以下是 Smalltalk 类定义的一个例子:
The following is an example of a Smalltalk class definition:
"Smalltalk Example Program"
"The following is a class definition, instantiations of which can draw equilateral polygons of any number of sides"
class name Polygon
superclass Object
instance variable names ourPen
numSides
sideLength
"Class methods"
"Create an instance"
new
^ super new getPen
"Get a pen for drawing polygons"
getPen
ourPen <- Pen new defaultNib: 2
"Instance methods"
"Draw a polygon"
draw
numSides timesRepeat: [ourPen go: sideLength;
turn: 360 // numSides]
"Set length of sides"
length: len
sideLength <- len
"Set number of sides"
sides: num
numSides <- num
"Smalltalk Example Program"
"The following is a class definition, instantiations of which can draw equilateral polygons of any number of sides"
class name Polygon
superclass Object
instance variable names ourPen
numSides
sideLength
"Class methods"
"Create an instance"
new
^ super new getPen
"Get a pen for drawing polygons"
getPen
ourPen <- Pen new defaultNib: 2
"Instance methods"
"Draw a polygon"
draw
numSides timesRepeat: [ourPen go: sideLength;
turn: 360 // numSides]
"Set length of sides"
length: len
sideLength <- len
"Set number of sides"
sides: num
numSides <- num
2.12节 讨论了 C 的起源; 2.10节 讨论了 Simula 67 的起源; 2.15节 讨论了 Smalltalk 的起源。C++ 在 C 之上构建了从 Simula 67 借用的语言功能,以支持 Smalltalk 开创的大部分功能。C++ 是从 C 演变而来的,经过一系列修改,改进了其命令式特性并添加了支持面向对象编程的结构。
The origins of C were discussed in Section 2.12; the origins of Simula 67 were discussed in Section 2.10; the origins of Smalltalk were discussed in Section 2.15. C++ builds language facilities, borrowed from Simula 67, on top of C to support much of what Smalltalk pioneered. C++ has evolved from C through a sequence of modifications to improve its imperative features and to add constructs to support object-oriented programming.
1980 年,贝尔实验室的 Bjarne Stroustrup 迈出了从 C 向 C++ 迈出的第一步。对 C 的最初修改包括添加函数参数类型检查和转换,更重要的是,添加了类,这些类与 SIMULA 67 和 Smalltalk 的类相关。还包括派生类、继承组件的公有/私有访问控制、构造函数和析构函数方法以及朋友类。1981 年,添加了内联函数、默认参数和赋值运算符的重载。由此产生的语言被称为带类的 C,在Stroustrup (1983)中进行了描述。
The first step from C toward C++ was made by Bjarne Stroustrup at Bell Laboratories in 1980. The initial modifications to C included the addition of function parameter type checking and conversion and, more significantly, classes, which are related to those of SIMULA 67 and Smalltalk. Also included were derived classes, public/private access control of inherited components, constructor and destructor methods, and friend classes. During 1981, inline functions, default parameters, and overloading of the assignment operator were added. The resulting language was called C with Classes and is described in Stroustrup (1983).
考虑一下带类的 C 语言的一些目标是很有用的。主要目标是提供一种语言,在这种语言中,程序可以像在 SIMULA 67 中一样组织起来 — 即使用类和继承。第二个重要目标是,相对于 C 语言,性能损失应该很小或没有。例如,甚至没有考虑数组索引范围检查,因为与 C 语言相比,这将导致显著的性能劣势。带类的 C 语言的第三个目标是,它可以用于可以使用 C 语言的每个应用程序,因此实际上不会删除 C 语言的任何功能,即使是那些被认为不安全的功能。
It is useful to consider some goals of C with Classes. The primary goal was to provide a language in which programs could be organized as they could be organized in SIMULA 67—that is, with classes and inheritance. A second important goal was that there should be little or no performance penalty relative to C. For example, array index range checking was not even considered because a significant performance disadvantage, relative to C, would result. A third goal of C with Classes was that it could be used for every application for which C could be used, so virtually none of the features of C would be removed, not even those considered to be unsafe.
到 1984 年,该语言通过包含虚拟方法得到了扩展,虚拟方法提供了方法调用与特定方法定义、方法名称和运算符重载以及引用类型的动态绑定。该语言的这个版本被称为 C++。它在Stroustrup (1984)中进行了描述。
By 1984, this language was extended by the inclusion of virtual methods, which provide dynamic binding of method calls to specific method definitions, method name and operator overloading, and reference types. This version of the language was called C++. It is described in Stroustrup (1984).
1985 年,出现了第一个可用的实现:一个名为 Cfront 的系统,它将 C++ 程序翻译成 C 程序。这个版本的 Cfront 和它实现的 C++ 版本被命名为 Release 1.0。它在Stroustrup (1986)中进行了描述。
In 1985, the first available implementation appeared: a system named Cfront, which translated C++ programs into C programs. This version of Cfront and the version of C++ it implemented were named Release 1.0. It is described in Stroustrup (1986).
1985 年至 1989 年间,C++ 不断发展,这主要基于用户对第一个分布式实现的反应。下一个版本被命名为 Release 2.0。它的 Cfront 实现于 1989 年 6 月发布。C++ Release 2.0 中添加的最重要的特性是支持多重继承(具有多个父类的类)和抽象类,以及其他一些增强功能。第12章 介绍了抽象类。
Between 1985 and 1989, C++ continued to evolve, based largely on user reactions to the first distributed implementation. This next version was named Release 2.0. Its Cfront implementation was released in June 1989. The most important features added to C++ Release 2.0 were support for multiple inheritance (classes with more than one parent class) and abstract classes, along with some other enhancements. Abstract classes are described in Chapter 12.
C++ 3.0 版于 1989 年至 1990 年间推出。它添加了模板,提供参数化类型和异常处理。C++ 的当前版本于 1998 年标准化,在ISO (1998)中有描述。
Release 3.0 of C++ evolved between 1989 and 1990. It added templates, which provide parameterized types, and exception handling. The current version of C++, which was standardized in 1998, is described in ISO (1998).
2002 年,微软发布了 .NET 计算平台,其中包括新版本的 C++,名为托管 C++,简称 MC++。MC++ 扩展了 C++,以提供对 .NET Framework 功能的访问。新增功能包括属性、委托、接口和垃圾收集对象的引用类型。属性将在第 11章中讨论。委托将在 2.19节 的 C# 简介中简要讨论。由于 .NET 不支持多重继承,因此 MC++ 也不支持。
In 2002, Microsoft released its .NET computing platform, which included a new version of C++, named Managed C++, or MC++. MC++ extends C++ to provide access to the functionality of the .NET Framework. The additions include properties, delegates, interfaces, and a reference type for garbage-collected objects. Properties are discussed in Chapter 11. Delegates are briefly discussed in the introduction to C# in Section 2.19. Because .NET does not support multiple inheritance, neither does MC++.
由于 C++ 同时具有函数和方法,因此它同时支持过程式和面向对象编程。
Because C++ has both functions and methods, it supports both procedural and object-oriented programming.
C++ 中的运算符可以重载,这意味着用户可以为用户定义类型的现有运算符创建运算符。C++ 方法也可以重载,这意味着用户可以定义多个同名的方法,只要它们的参数数量或类型不同。
Operators in C++ can be overloaded, meaning the user can create operators for existing operators on user-defined types. C++ methods can also be overloaded, meaning the user can define more than one method with the same name, provided either the numbers or types of their parameters are different.
C++ 中的动态绑定由虚拟方法提供。这些方法使用重载方法在通过继承而关联的类集合中定义类型相关的操作。指向类 A 对象的指针也可以指向以类 A 为祖先的类的对象。当此指针指向重载虚拟方法时,将动态选择当前类型的方法。
Dynamic binding in C++ is provided by virtual methods. These methods define type-dependent operations, using overloaded methods, within a collection of classes that are related through inheritance. A pointer to an object of class A can also point to objects of classes that have class A as an ancestor. When this pointer points to an overloaded virtual method, the method of the current type is chosen dynamically.
方法和类都可以模板化,这意味着它们可以被参数化。例如,可以将方法编写为模板化方法,以允许其具有针对各种参数类型的版本。类享有同样的灵活性。
Both methods and classes can be templated, which means that they can be parameterized. For example, a method can be written as a templated method to allow it to have versions for a variety of parameter types. Classes enjoy the same flexibility.
C++ 支持多重继承。第14章 讨论了 C++ 的异常处理结构。
C++ supports multiple inheritance. The exception-handling constructs of C++ are discussed in Chapter 14.
C++ 迅速成为并一直被广泛使用的语言。其流行的一个因素是市面上有优质且价格低廉的编译器。另一个因素是它几乎完全向后兼容 C(这意味着 C 程序只需稍加改动即可编译为 C++ 程序),并且在大多数情况下,可以将 C++ 代码与 C 代码链接起来 — 因此许多 C 程序员可以相对轻松地学习 C++。最后,在 C++ 首次出现时,当面向对象编程开始受到广泛关注时,C++ 是唯一适合大型商业软件项目的可用语言。
C++ rapidly became and remains a widely used language. One factor in its popularity is the availability of good and inexpensive compilers. Another factor is that it is almost completely backward compatible with C (meaning that C programs, with few changes, can be compiled as C++ programs), and in most implementations it is possible to link C++ code with C code—and thus relatively easy for the many C programmers to learn C++. Finally, at the time C++ first appeared, when object-oriented programming began to receive widespread interest, C++ was the only available language that was suitable for large commercial software projects.
而负面的一面是,由于 C++ 是一种非常庞大和复杂的语言,它显然存在与 PL/I 类似的缺点。它继承了 C 的大部分不安全性,这使得它不如 Ada 和 Java 等语言安全。
On the negative side, because C++ is a very large and complex language, it clearly suffers drawbacks similar to those of PL/I. It inherited most of the insecurities of C, which make it less safe than languages such as Ada and Java.
从 2002 年的 MAC OS X 开始,Apple 系统软件都是用 Objective-C 编写的。Swift 是由 Apple 开发的,作为 Objective-C 的改进替代品。2010 年,Chris Lattner 开始研究 Swift。该语言于 2014 年推出,第 2 版于 2015 年推出。第一个版本是专有的,但第二个版本是开源的。它目前在 Apple 的所有操作系统以及 Linux 上都已实现。
Beginning with MAC OS X in 2002, Apple systems software was written in Objective-C. Swift was developed by Apple as an improved replacement for Objective-C. Work on Swift began in 2010 by Chris Lattner. The language was introduced in 2014, with version 2 being introduced in 2015. The first version was proprietary, but the second is open source. It is currently implemented under all of Apple’s operating systems, as well as on Linux.
Swift 的特性包括元组数据类型、选项类型(此类型的变量可以具有特殊的无值值)、协议(类似于 Java 接口)、两类类型,即类和结构体,它们支持引用类型和值类型,就像在 C# 中一样、泛型类型、无指针、更安全的 switch 构造(其中默认值是不会转到下一个选项)以及所有语句集合(包括单个语句),在所有控制构造中都必须用括号括起来。
Among the features of Swift are a tuple data type, an option type (variables of this type can have a special no-value value), protocols (similar to Java interfaces), two categories of types, class and struct, which support reference types and value types, as in C#, generic types, no pointers, a safer switch construct, in which the default is to not fall through to the next option, and all statement collections, including a single statement, must be enclosed in braces in all control constructs.
语句不需要以分号结尾,除非同一行有两个或多个语句。不需要声明数据类型,因为使用了类型推断。与 C 和 C++ 不同,赋值语句不返回值,因此x = 0在布尔表达式中使用 是不合法的。这消除了 C 和 C++ 程序中常见的错误,即 中x = 0键入的是 而不是x == 0。使用引用计数器自动回收堆分配的对象。
Statements need not be terminated with semicolons, unless there are two or more statements on the same line. Types of data need not be declared, as type inferencing is used. Unlike C and C++, assignment statements do not return a value, so using x = 0 is not legal in a Boolean expression. This eliminates a common error in C and C++ programs, in which x = 0 is typed instead of x == 0. Heap allocated objects are automatically reclaimed, using reference counters.
Swift 程序可以与 Objective-C 代码交互,并且 Swift 使用与该语言相同的库。根据 TIOBE 社区报告,Swift 已经是第十大最受欢迎的编程语言。
Swift programs can interact with Objective-C code and Swift uses the same libraries as that language. Swift is already the tenth most popular programming language, according to the TIOBE Community Report.
Delphi ( Lischner, 2000 ) 是一种混合语言,类似于 C++ 和 Objective-C,它是通过向现有的命令式语言(在本例中为 Pascal)添加面向对象支持等而创建的。Apple 设计了一个面向对象的 Pascal 版本,名为 Object Pascal,但后来放弃了该项目。曾为 Windows 开发了 Turbo Pascal 的 Borland 也设计了一个基于 Turbo Pascal 的面向对象版本的 Pascal,也名为 Object Pascal。由于多种原因,Borland 将 Object Pascal 更名为 Delphi。第一款以此名称命名的产品于 1995 年发布,其中包括一个集成开发环境 (IDE)。一些人认为 IDE 是 Delphi,底层编程语言是 Object Pascal。其他类似产品供应商继续将该语言称为 Object Pascal。
Delphi (Lischner, 2000) is a hybrid language, similar to C++ and Objective-C, in that it was created by adding object-oriented support, among other things, to an existing imperative language, in this case Pascal. Apple designed an object-oriented version of Pascal, named Object Pascal, but subsequently dropped the project. Borland, which had developed Turbo Pascal for Windows, also designed an object-oriented version of Pascal, based on Turbo Pascal, also named Object Pascal. For several reasons, Borland renamed Object Pascal as Delphi. The first product with that name, which included an integrated development environment (IDE), was released in 1995. Some consider the IDE to be Delphi and the underlying programming language to be Object Pascal. Other suppliers of similar products continue to refer to the language as Object Pascal.
C++ 和 Delphi 之间的许多差异是由其前身语言和它们所衍生的编程文化所造成的。由于 C 是一种功能强大但可能不安全的语言,因此 C++ 也符合这一描述,至少在数组下标范围检查、指针算法及其众多类型强制方面是如此。同样,由于 Pascal 比 C 更优雅、更安全,因此 Delphi 比 C++ 更优雅、更安全。Delphi 也比 C++ 简单。例如,Delphi 不包括用户定义的运算符重载、通用子程序和参数化类,而所有这些都是 C++ 的一部分。
Many of the differences between C++ and Delphi are a result of the predecessor language and the surrounding programming cultures from which they are derived. Because C is a powerful but potentially unsafe language, C++ also fits that description, at least in the areas of array subscript range checking, pointer arithmetic, and its numerous type coercions. Similarly, because Pascal is more elegant and safer than C, Delphi is more elegant and safer than C++. Delphi is also less complex than C++. For example, Delphi does not include user-defined operator overloading, generic subprograms, and parameterized classes, all of which are part of C++.
Delphi 的设计者是 Anders Hejlsberg,他之前曾开发过 Turbo Pascal 系统。Hejlsberg 于 1996 年转投微软,是 C# 的首席设计师。
Delphi was designed by Anders Hejlsberg, who had previously developed the Turbo Pascal system. Hejlsberg, who moved to Microsoft in 1996, was the lead designer of C#.
Java 的设计者从 C++ 开始,删除了一些结构,更改了一些结构,并添加了一些其他结构。最终的语言提供了 C++ 的大部分功能和灵活性,但体积更小、更简单、更安全。自最初的设计以来,Java 已经取得了长足的发展。
Java’s designers started with C++, removed some constructs, changed some, and added a few others. The resulting language provides much of the power and flexibility of C++, but in a smaller, simpler, and safer language. Since that initial design, Java has grown considerably.
Java 与许多编程语言一样,是为一种似乎没有令人满意的现有语言的应用程序而设计的。1990 年,Sun Microsystems 确定需要一种用于嵌入式消费电子设备(如烤面包机、微波炉和交互式电视系统)的编程语言。可靠性是这种语言的主要目标之一。可靠性似乎不是微波炉软件的一个重要因素。如果烤箱的软件出现故障,它可能不会对任何人构成严重危险,而且很可能不会导致大笔法律和解。但是,如果在生产和销售了一百万台之后发现特定型号的软件有缺陷,召回将需要大量成本。因此,可靠性是消费电子产品软件的一个重要特征。
Java, like many programming languages, was designed for an application for which there appeared to be no satisfactory existing language. In 1990, Sun Microsystems determined there was a need for a programming language for embedded consumer electronic devices, such as toasters, microwave ovens, and interactive TV systems. Reliability was one of the primary goals for such a language. It may not seem that reliability would be an important factor in the software for a microwave oven. If an oven had malfunctioning software, it probably would not pose a grave danger to anyone and most likely would not lead to large legal settlements. However, if the software in a particular model was found to be erroneous after a million units had been manufactured and sold, their recall would entail significant cost. Therefore, reliability is an important characteristic of the software in consumer electronic products.
在考虑了 C 和 C++ 之后,他们认为这两种语言都不能满足开发消费电子设备软件的要求。尽管 C 语言相对较小,但它不支持面向对象编程,而他们认为面向对象编程是必需的。C++ 支持面向对象编程,但它被认为过于庞大和复杂,部分原因是它也支持面向过程编程。人们还认为 C 和 C++ 都无法提供必要的可靠性。因此,一种新的语言(后来被称为 Java)应运而生。它的设计遵循一个基本目标,即提供比 C++ 更大的简单性和可靠性。
After considering C and C++, it was decided that neither would be satisfactory for developing software for consumer electronic devices. Although C was relatively small, it did not provide support for object-oriented programming, which they deemed a necessity. C++ supported object-oriented programming, but it was judged to be too large and complex, in part because it also supported procedure-oriented programming. It was also believed that neither C nor C++ provided the necessary level of reliability. So, a new language, later named Java, was designed. Its design was guided by the fundamental goal of providing greater simplicity and reliability than C++ was believed to provide.
尽管 Java 最初的推动力是消费电子产品,但早期使用 Java 的产品从未上市销售。从 1993 年开始,当万维网开始广泛使用时,很大程度上是因为新的图形浏览器,Java 被发现是一种有用的 Web 编程工具。特别是 Java 小程序,它们是相对较小的 Java 程序,可在 Web 浏览器中解释,其输出可包含在显示的 Web 文档中,在 20 世纪 90 年代中后期迅速流行起来。在 Java 流行的最初几年里,Web 是其最常见的应用程序。
Although the initial impetus for Java was consumer electronics, none of the products with which it was used in its early years were ever marketed. Starting in 1993, when the World Wide Web became widely used, and largely because of the new graphical browsers, Java was found to be a useful tool for Web programming. In particular, Java applets, which are relatively small Java programs that are interpreted in Web browsers and whose output can be included in displayed Web documents, quickly became very popular in the middle to late 1990s. In the first few years of Java popularity, the Web was its most common application.
Java 设计团队由 James Gosling 领导,他之前曾设计过 UNIX emacs 编辑器和 NeWS 窗口系统。
The Java design team was headed by James Gosling, who had previously designed the UNIX emacs editor and the NeWS windowing system.
正如我们之前所说,Java 基于 C++,但它被专门设计为更小、更简单、更可靠。与 C++ 一样,Java 既有类,也有原始类型。Java 数组是预定义类的实例,而在 C++ 中但事实并非如此,尽管许多 C++ 用户为数组构建了包装类来添加诸如索引范围检查之类的功能,但这是 Java 中隐含的。
As we stated previously, Java is based on C++ but it was specifically designed to be smaller, simpler, and more reliable. Like C++, Java has both classes and primitive types. Java arrays are instances of a predefined class, whereas in C++ they are not, although many C++ users build wrapper classes for arrays to add features like index range checking, which is implicit in Java.
Java 没有指针,但其引用类型提供了指针的一些功能。这些引用用于指向类实例。所有对象都分配在堆上。必要时,引用总是隐式取消引用。因此它们的行为更像普通标量变量。
Java does not have pointers, but its reference types provide some of the capabilities of pointers. These references are used to point to class instances. All objects are allocated on the heap. References are always implicitly dereferenced, when necessary. So they behave more like ordinary scalar variables.
Java 有一个名为 的原始布尔类型boolean,主要用于其控制语句(例如if和while)的控制表达式。与 C 和 C++ 不同,算术表达式不能用于控制表达式。
Java has a primitive Boolean type named boolean, used mainly for the control expressions of its control statements (such as if and while). Unlike C and C++, arithmetic expressions cannot be used for control expressions.
Java 与许多支持面向对象编程的前身(包括 C++)之间的一个显著区别是,Java 中无法编写独立的子程序。所有 Java 子程序都是方法,并在类中定义。此外,方法只能通过类或对象调用。这导致的结果是,虽然 C++ 同时支持过程式和面向对象编程,但 Java 仅支持面向对象编程。
One significant difference between Java and many of its predecessors that support object-oriented programming, including C++, is that it is not possible to write stand-alone subprograms in Java. All Java subprograms are methods and are defined in classes. Furthermore, methods can be called through a class or object only. One consequence of this is that while C++ supports both procedural and object-oriented programming, Java supports object-oriented programming only.
C++ 和 Java 之间的另一个重要区别是,C++ 在其类定义中直接支持多重继承。Java 仅支持类的单次继承,尽管通过使用其接口构造可以获得多重继承的一些好处。
Another important difference between C++ and Java is that C++ supports multiple inheritance directly in its class definitions. Java supports only single inheritance of classes, although some of the benefits of multiple inheritance can be gained by using its interface construct.
未被复制到 Java 中的 C++ 构造包括结构体和联合体。
Among the C++ constructs that were not copied into Java are structs and unions.
Java 通过修饰符包含一种相对简单的并发控制形式synchronized,修饰符可以出现在方法和块上。无论哪种情况,它都会导致附加锁。锁确保互斥访问或执行。在 Java 中,创建并发进程相对容易,在 Java 中称为线程。
Java includes a relatively simple form of concurrency control through its synchronized modifier, which can appear on methods and blocks. In either case, it causes a lock to be attached. The lock ensures mutually exclusive access or execution. In Java, it is relatively easy to create concurrent processes, which in Java are called threads.
Java 对其对象使用隐式存储释放,通常称为垃圾收集。这样,程序员就无需在不再需要对象时显式删除它们。用没有垃圾收集的语言编写的程序经常会出现所谓的内存泄漏,这意味着存储被分配但从未被释放。这显然会导致所有可用存储空间最终耗尽。对象释放将在第6章 中详细讨论。
Java uses implicit storage deallocation for its objects, often called garbage collection. This frees the programmer from needing to delete objects explicitly when they are no longer needed. Programs written in languages that do not have garbage collection often suffer from what is sometimes called memory leakage, which means that storage is allocated but never deallocated. This can obviously lead to eventual depletion of all available storage. Object deallocation is discussed in detail in Chapter 6.
与 C 和 C++ 不同,Java 仅在类型扩展(从“较小”类型到“较大”类型)时才包含赋值类型强制(隐式类型转换)。因此,int类型float强制是通过赋值运算符完成的,而float类型int强制则不是。
Unlike C and C++, Java includes assignment type coercions (implicit type conversions) only if they are widening (from a “smaller” type to a “larger” type). So int to float coercions are done across the assignment operator, but float to int coercions are not.
Java 的设计者在削减 C++ 中多余和/或不安全的功能方面做得很好。例如,消除了 C++ 中一半的赋值强制,这显然是朝着更高可靠性迈出的一步。数组访问的索引范围检查也使该语言更安全。并发性增强了可以用该语言编写的应用程序的范围,图形用户界面、数据库访问和网络的类库也是如此。
The designers of Java did well at trimming the excess and/or unsafe features of C++. For example, the elimination of half of the assignment coercions that are done in C++ was clearly a step toward higher reliability. Index range checking of array accesses also makes the language safer. The addition of concurrency enhances the range of applications that can be written in the language, as do the class libraries for graphical user interfaces, database access, and networking.
Java 的可移植性(至少是中间形式的可移植性)通常归因于语言的设计,但事实并非如此。任何语言都可以翻译成中间形式,并在具有该中间形式虚拟机的任何平台上“运行”。这种可移植性的代价是解释成本,传统上解释成本比机器代码的执行成本高出一个数量级。Java 解释器的初始版本称为 Java 虚拟机 (JVM),它确实比等效编译的 C 程序慢至少 10 倍。但是,现在许多 Java 程序在执行之前都会使用即时 (JIT) 编译器翻译成机器代码。这使得 Java 程序的效率与传统编译语言(如 C++)中的程序相媲美,至少在不考虑数组索引范围检查时是如此。
Java’s portability, at least in intermediate form, has often been attributed to the design of the language, but it is not. Any language can be translated to an intermediate form and “run” on any platform that has a virtual machine for that intermediate form. The price of this kind of portability is the cost of interpretation, which traditionally has been about an order of magnitude more than execution of machine code. The initial version of the Java interpreter, called the Java Virtual Machine (JVM), indeed was at least 10 times slower than equivalent compiled C programs. However, many Java programs are now translated to machine code before being executed, using Just-in-Time (JIT) compilers. This makes the efficiency of Java programs competitive with that of programs in conventionally compiled languages such as C++, at least when array index range checking is not considered.
Java 的使用增长速度比任何其他编程语言都要快。最初,这是由于它在编写动态 Web 文档方面具有很高的价值。显然,Java 迅速崛起的原因之一就是程序员喜欢它的设计。一些开发人员认为 C++ 太大太复杂,不实用也不安全。Java 为他们提供了一种替代方案,它具有 C++ 的很多功能,但语言更简单、更安全。另一个原因是 Java 的编译器/解释器系统是免费的,并且可以轻松地在 Web 上获得。Java 现在广泛应用于各种不同的应用领域。
The use of Java increased faster than that of any other programming language. Initially, this was due to its value in programming dynamic Web documents. Clearly, one of the reasons for Java’s rapid rise to prominence is simply that programmers like its design. Some developers thought C++ was too large and complex to be practical and safe. Java offered them an alternative that has much of the power of C++, but in a simpler, safer language. Another reason is that the compiler/interpreter system for Java is free and easily obtained on the Web. Java is now widely used in a variety of different applications areas.
Java 的最新版本 Java SE8 于 2014 年问世。自第一个版本以来,该语言已添加了许多重要功能。其中包括枚举类、泛型、新的迭代构造、lambda 表达式和众多类库。
The most recent version of Java, Java SE8, appeared in 2014. Since the first version, significant features have been added to the language. Among these are an enumeration class, generics, a new iteration construct, lambda expressions, and numerous class libraries.
以下是一个 Java 程序的示例:
The following is an example of a Java program:
// Java Example Program
// Input: An integer, listlen, where listlen is less
// than 100, followed by length-integer values
// Output: The number of input data that are greater than
// the average of all input values
import java.io.*;
class IntSort {
public static void main(String args[]) throws IOException {
DataInputStream in = new DataInputStream(System.in);
int listlen,
counter,
sum = 0,
average,
result = 0;
int[] intlist = new int[99];
listlen = Integer.parseInt(in.readLine());
if ((listlen > 0) && (listlen < 100)) {
/* Read input into an array and compute the sum */
for (counter = 0; counter < listlen; counter++) {
intlist[counter] =
Integer.valueOf(in.readLine()).intValue();
sum += intlist[counter];
}
/* Compute the average */
average = sum / listlen;
/* Count the input values that are > average */
for (counter = 0; counter < listlen; counter++)
if (intlist[counter] > average) result++;
/* Print result */
System.out.println(
"\nNumber of values > average is:" + result);
} //** end of then clause of if ((listlen > 0) ...
else System.out.println(
"Error-input list length is not legal\n");
} //** end of method main
} //** end of class IntSort
// Java Example Program
// Input: An integer, listlen, where listlen is less
// than 100, followed by length-integer values
// Output: The number of input data that are greater than
// the average of all input values
import java.io.*;
class IntSort {
public static void main(String args[]) throws IOException {
DataInputStream in = new DataInputStream(System.in);
int listlen,
counter,
sum = 0,
average,
result = 0;
int[] intlist = new int[99];
listlen = Integer.parseInt(in.readLine());
if ((listlen > 0) && (listlen < 100)) {
/* Read input into an array and compute the sum */
for (counter = 0; counter < listlen; counter++) {
intlist[counter] =
Integer.valueOf(in.readLine()).intValue();
sum += intlist[counter];
}
/* Compute the average */
average = sum / listlen;
/* Count the input values that are > average */
for (counter = 0; counter < listlen; counter++)
if (intlist[counter] > average) result++;
/* Print result */
System.out.println(
"\nNumber of values > average is:" + result);
} //** end of then clause of if ((listlen > 0) ...
else System.out.println(
"Error-input list length is not legal\n");
} //** end of method main
} //** end of class IntSort在过去的 35 年中,脚本语言不断发展。早期的脚本语言是将一串命令(称为脚本)放入要解释的文件中来使用的。这些语言中的第一个,名为shshell,最初是一小串命令,这些命令被解释为对执行实用功能(例如文件管理和简单文件过滤)的系统子程序的调用。在此基础上添加了变量、控制流语句、函数和各种其他功能,最终形成了一种完整的编程语言。其中最强大和最广为人知的语言之一是ksh(Bolsky 和 Korn,1995),它是由贝尔实验室的 David Korn 开发的。
Scripting languages have evolved over the past 35 years. Early scripting languages were used by putting a list of commands, called a script, in a file to be interpreted. The first of these languages, named sh (for shell), began as a small collection of commands that were interpreted as calls to system subprograms that performed utility functions, such as file management and simple file filtering. To this were added variables, control flow statements, functions, and various other capabilities, and the result is a complete programming language. One of the most powerful and widely known of these is ksh (Bolsky and Korn, 1995), which was developed by David Korn at Bell Laboratories.
另一种脚本语言是awk,由贝尔实验室的 Al Aho、Brian Kernighan 和 Peter Weinberger 开发(Aho 等,1988 年)。awk最初是一种报告生成语言,但后来成为一种更通用的语言。
Another scripting language is awk, developed by Al Aho, Brian Kernighan, and Peter Weinberger at Bell Laboratories (Aho et al., 1988). awk began as a report-generation language but later became a more general-purpose language.
Perl 语言由 Larry Wall 开发,最初是sh和的组合awk。Perl 自诞生以来发展迅速,现在已成为一种功能强大(尽管仍然有些原始)的编程语言。虽然它仍然经常被称为脚本语言,但它实际上更类似于典型的命令式语言,因为它总是在执行之前被编译,至少被编译成中间语言。此外,它具有所有构造,使其适用于各种计算问题领域。
The Perl language, developed by Larry Wall, was originally a combination of sh and awk. Perl has grown significantly since its beginnings and is now a powerful, although still somewhat primitive, programming language. Although it is still often called a scripting language, it is actually more similar to a typical imperative language, since it is always compiled, at least into an intermediate language, before it is executed. Furthermore, it has all the constructs to make it applicable to a wide variety of areas of computational problems.
Perl 有许多有趣的特性,本章仅提到其中的几个,并在本书后面进行讨论。
Perl has a number of interesting features, only a few of which are mentioned in this chapter and discussed later in the book.
Perl 中的变量是静态类型的,并且是隐式声明的。变量有三个不同的命名空间,由变量名称的第一个字符表示。所有标量变量名都以美元符号 ( $) 开头,所有数组名称都以 at 符号 ( @) 开头,所有哈希名称(哈希将在下文简要介绍)都以百分号 ( %) 开头。这种约定使程序中的变量名比大多数其他编程语言的变量名更具可读性。
Variables in Perl are statically typed and implicitly declared. There are three distinctive namespaces for variables, denoted by the first character of the variables’ names. All scalar variable names begin with dollar signs ($), all array names begin with at signs (@), and all hash names (hashes are briefly described below) begin with percent signs (%). This convention makes variable names in programs more readable than those of most other programming languages.
Perl 包含大量隐式变量。其中一些用于存储 Perl 参数,例如实现中使用的特定形式的换行符或字符。隐式变量通常用作内置函数的默认参数和某些运算符的默认操作数。隐式变量具有独特(尽管隐晦)的名称,例如$!和@_。隐式变量的名称与用户定义的变量名称一样,使用三个命名空间,$!标量变量的名称也是如此。
Perl includes a large number of implicit variables. Some of them are used to store Perl parameters, such as the particular form of newline character or characters that are used in the implementation. Implicit variables are commonly used as default parameters to built-in functions and default operands for some operators. The implicit variables have distinctive—although cryptic—names, such as $! and @_. The implicit variables’ names, like the user-defined variable names, use the three namespaces, so $! is the name of a scalar variable.
Perl 的数组有两个特点,使它们有别于常见命令式语言的数组。首先,它们的长度是动态的,这意味着它们可以在执行过程中根据需要增长和缩小。其次,数组可以是稀疏的,这意味着元素之间可以有间隙。这些间隙不占用内存空间,用于数组的迭代语句foreach会迭代缺失的元素。
Perl’s arrays have two characteristics that set them apart from the arrays of the common imperative languages. First, they have dynamic length, meaning that they can grow and shrink as needed during execution. Second, arrays can be sparse, meaning that there can be gaps between the elements. These gaps do not take space in memory, and the iteration statement used for arrays, foreach, iterates over the missing elements.
Perl 包含关联数组,称为哈希表。这些数据结构由字符串索引,是隐式控制的哈希表。Perl 系统提供哈希函数并在必要时增加结构的大小。
Perl includes associative arrays, which are called hashes. These data structures are indexed by strings and are implicitly controlled hash tables. The Perl system supplies the hash function and increases the size of the structure when necessary.
Perl 是一种功能强大但有点危险的语言。它的标量类型同时存储字符串和数字,通常以双精度浮点形式存储。根据上下文,数字可能会被强制转换为字符串,反之亦然。如果在数字上下文中使用字符串并且无法将该字符串转换为数字,则将使用零,并且不会向用户提供警告或错误消息。这可能导致编译器或运行时系统无法检测到的错误。无法检查数组索引,因为任何数组都没有设置下标范围。对不存在元素的引用将返回undef,在数字上下文中被解释为零。因此,在数组元素访问中也没有错误检测。
Perl is a powerful, but somewhat dangerous, language. Its scalar type stores both strings and numbers, which are normally stored in double-precision floating-point form. Depending on the context, numbers may be coerced to strings and vice versa. If a string is used in numeric context and the string cannot be converted to a number, zero is used and there is no warning or error message provided for the user. This can lead to errors that are not detected by the compiler or run-time system. Array indexing cannot be checked, because there is no set subscript range for any array. References to nonexistent elements return undef, which is interpreted as zero in numeric context. So, there is also no error detection in array element access.
Perl 最初是作为处理文本文件的 UNIX 实用程序使用的。它曾经是并且现在仍被广泛用作 UNIX 系统管理工具。当万维网出现时,Perl 被广泛用作 Web 的通用网关接口语言,尽管现在很少用于此目的。Perl 被用作各种应用的通用语言,例如计算生物学和人工智能。
Perl’s initial use was as a UNIX utility for processing text files. It was and still is widely used as a UNIX system administration tool. When the World Wide Web appeared, Perl achieved widespread use as a common gateway interface language for use with the Web, although it is now rarely used for that purpose. Perl is used as a general-purpose language for a variety of applications, such as computational biology and artificial intelligence.
以下是一个 Perl 程序的示例:
The following is an example of a Perl program:
# Perl Example Program
# Input: An integer, $listlen, where $listlen is less
# than 100, followed by $listlen-integer values.
# Output: The number of input values that are greater than
# the average of all input values.
($sum, $result) = (0, 0);
$listlen = <STDIN>;
if (($listlen > 0) && ($listlen < 100)) {
# Read input into an array and compute the sum
for ($counter = 0; $counter < $listlen; $counter++) {
$intlist[$counter] = <STDIN>;
} #- end of for (counter ...
# Compute the average
$average = $sum / $listlen;
# Count the input values that are > average
foreach $num (@intlist) {
if ($num > $average) { $result++; }
} #- end of foreach $num ...
# Print result
print "Number of values > average is: $result \n";
} #- end of if (($listlen ...
else {
print "Error--input list length is not legal \n";
}
# Perl Example Program
# Input: An integer, $listlen, where $listlen is less
# than 100, followed by $listlen-integer values.
# Output: The number of input values that are greater than
# the average of all input values.
($sum, $result) = (0, 0);
$listlen = <STDIN>;
if (($listlen > 0) && ($listlen < 100)) {
# Read input into an array and compute the sum
for ($counter = 0; $counter < $listlen; $counter++) {
$intlist[$counter] = <STDIN>;
} #- end of for (counter ...
# Compute the average
$average = $sum / $listlen;
# Count the input values that are > average
foreach $num (@intlist) {
if ($num > $average) { $result++; }
} #- end of foreach $num ...
# Print result
print "Number of values > average is: $result \n";
} #- end of if (($listlen ...
else {
print "Error--input list length is not legal \n";
}20 世纪 90 年代中期,随着第一批图形浏览器的出现,网络的使用量呈爆炸式增长。HTML 文档本身是完全静态的,因此对计算的需求很快变得至关重要。通用网关接口 (CGI) 使服务器端的计算成为可能,它允许 HTML 文档请求执行服务器上的程序,并将此类计算的结果以 HTML 文档的形式返回给浏览器。随着 Java 小程序的出现,浏览器端的计算也变得可用。这两种方法现在大部分已被较新的技术(主要是脚本语言)所取代。
Use of the Web exploded in the mid-1990s after the first graphical browsers appeared. The need for computation associated with HTML documents, which by themselves are completely static, quickly became critical. Computation on the server side was made possible with the common gateway interface (CGI), which allowed HTML documents to request the execution of programs on the server, with the results of such computations returned to the browser in the form of HTML documents. Computation on the browser end became available with the advent of Java applets. Both of these approaches have now been replaced for the most part by newer technologies, primarily scripting languages.
JavaScript 最初由 Brendan Eich 在 Netscape 开发。它的原名是 Mocha。后来改名为 LiveScript。1995 年底,LiveScript 成为 Netscape 和 Sun Microsystems 的合资企业,并更名为 JavaScript。JavaScript 经历了广泛的演变,从 1.0 版发展到 1.5 版,增加了许多新特性和功能。欧洲计算机制造商协会 (ECMA) 于 20 世纪 90 年代末制定了 JavaScript 的语言标准 ECMA-262。该标准还被国际标准组织 (ISO) 批准为 ISO-16262。Microsoft 版本的 JavaScript 名为 JScript .NET。
JavaScript was originally developed by Brendan Eich at Netscape. Its original name was Mocha. It was later renamed LiveScript. In late 1995, LiveScript became a joint venture of Netscape and Sun Microsystems and its name was changed to JavaScript. JavaScript has gone through extensive evolution, moving from version 1.0 to version 1.5 by adding many new features and capabilities. A language standard for JavaScript was developed in the late 1990s by the European Computer Manufacturers Association (ECMA) as ECMA-262. This standard has also been approved by the International Standards Organization (ISO) as ISO-16262. Microsoft’s version of JavaScript is named JScript .NET.
尽管 JavaScript 解释器可以嵌入到许多不同的应用程序中,但其最常见的用途是嵌入到 Web 浏览器中。JavaScript代码嵌入 HTML 文档中,并在显示文档时由浏览器解释。JavaScript 在 Web 编程中的主要用途是验证表单输入数据和创建动态 HTML 文档。
Although a JavaScript interpreter could be embedded in many different applications, its most common use is embedded in Web browsers. JavaScript code is embedded in HTML documents and interpreted by the browser when the documents are displayed. The primary uses of JavaScript in Web programming are to validate form input data and create dynamic HTML documents.
尽管 JavaScript 的名字与 Java 相似,但它与 Java 之间唯一的联系就是语法相似。Java 是强类型的,而 JavaScript 是动态类型的(参见第5章 )。JavaScript 的字符串及其数组的长度都是动态的。因此,JavaScript 不会检查数组索引的有效性,尽管 Java 要求这样做。Java 完全支持面向对象编程,但 JavaScript 既不支持继承,也不支持方法调用与方法的动态绑定。
In spite of its name, JavaScript is related to Java only through the use of similar syntax. Java is strongly typed, but JavaScript is dynamically typed (see Chapter 5). JavaScript’s character strings and its arrays have dynamic length. Because of this, array indices are not checked for validity, although this is required in Java. Java fully supports object-oriented programming, but JavaScript supports neither inheritance nor dynamic binding of method calls to methods.
JavaScript 最重要的用途之一是动态创建和修改 HTML 文档。JavaScript 定义了一个与 HTML 文档的层次模型相匹配的对象层次结构,该层次结构由文档对象模型定义。HTML 文档的元素通过这些对象访问,为动态控制文档元素提供了基础。
One of the most important uses of JavaScript is for dynamically creating and modifying HTML documents. JavaScript defines an object hierarchy that matches a hierarchical model of an HTML document, which is defined by the Document Object Model. Elements of an HTML document are accessed through these objects, providing the basis for dynamic control of the elements of documents.
以下是本章中用几种语言解决的问题的 JavaScript 脚本。请注意,假定此脚本将从 HTML 文档调用并由 Web 浏览器解释。
Following is a JavaScript script for the problem previously solved in several languages in this chapter. Note that it is assumed that this script will be called from an HTML document and interpreted by a Web browser.
// example.js
// Input: An integer, listLen, where listLen is less
// than 100, followed by listLen-numeric values
// Output: The number of input values that are greater
// than the average of all input values
var intList = new Array(99);
var listLen, counter, sum = 0, result = 0;
listLen = prompt (
"Please type the length of the input list", "");
if ((listLen > 0) && (listLen < 100)) {
// Get the input and compute its sum
for (counter = 0; counter < listLen; counter++) {
intList[counter] = prompt (
"Please type the next number", "");
sum += parseInt(intList[counter]);
}
// Compute the average
average = sum / listLen;
// Count the input values that are > average
for (counter = 0; counter < listLen; counter++)
if (intList[counter] > average) result++;
// Display the results
document.write("Number of values > average is: ",
result, "<br />");
} else
document.write(
"Error - input list length is not legal <br />");
// example.js
// Input: An integer, listLen, where listLen is less
// than 100, followed by listLen-numeric values
// Output: The number of input values that are greater
// than the average of all input values
var intList = new Array(99);
var listLen, counter, sum = 0, result = 0;
listLen = prompt (
"Please type the length of the input list", "");
if ((listLen > 0) && (listLen < 100)) {
// Get the input and compute its sum
for (counter = 0; counter < listLen; counter++) {
intList[counter] = prompt (
"Please type the next number", "");
sum += parseInt(intList[counter]);
}
// Compute the average
average = sum / listLen;
// Count the input values that are > average
for (counter = 0; counter < listLen; counter++)
if (intList[counter] > average) result++;
// Display the results
document.write("Number of values > average is: ",
result, "<br />");
} else
document.write(
"Error - input list length is not legal <br />");PHP(Tatroe 等,2013)由 Apache 集团员工 Rasmus Lerdorf 于 1994 年开发。他最初的动机是提供一种工具来帮助追踪其个人网站的访问者。1995 年,他开发了一个名为 Personal Home Page Tools 的软件包,该软件包成为 PHP 的第一个公开发行版本。最初,PHP 是 Personal Home Page 的缩写。后来,其用户社区开始使用递归名称 PHP:超文本预处理器,这随后迫使原始名称变得晦涩难懂。PHP 现在作为开源产品进行开发、分发和支持。PHP 处理器驻留在大多数 Web 服务器上。
PHP (Tatroe et al., 2013) was developed by Rasmus Lerdorf, an employee of the Apache Group, in 1994. His initial motivation was to provide a tool to help track visitors to his personal Web site. In 1995, he developed a package called Personal Home Page Tools, which became the first publicly distributed version of PHP. Originally, PHP was an abbreviation for Personal Home Page. Later, its user community began using the recursive name PHP: Hypertext Preprocessor, which subsequently forced the original name into obscurity. PHP is now developed, distributed, and supported as an open-source product. PHP processors are resident on most Web servers.
PHP 是一种嵌入 HTML 的服务器端脚本语言,专为 Web 应用程序而设计。当浏览器请求嵌入 PHP 代码的 HTML 文档时,Web 服务器上会解释 PHP 代码。PHP 代码通常会生成 HTML 代码作为输出,以替换 HTML 文档中的 PHP 代码。因此,Web 浏览器永远不会看到 PHP 代码。
PHP is an HTML-embedded server-side scripting language specifically designed for Web applications. PHP code is interpreted on the Web server when an HTML document in which it is embedded has been requested by a browser. PHP code usually produces HTML code as output, which replaces the PHP code in the HTML document. Therefore, a Web browser never sees PHP code.
PHP 在语法外观、字符串和数组的动态特性以及动态类型的使用方面与 JavaScript 相似。PHP 的数组是 JavaScript 的数组和 Perl 的哈希的组合。
PHP is similar to JavaScript in its syntactic appearance, the dynamic nature of its strings and arrays, and its use of dynamic typing. PHP’s arrays are a combination of JavaScript’s arrays and Perl’s hashes.
PHP 的原始版本不支持面向对象编程。后来,该语言添加了抽象类、接口、析构函数和类成员的访问控制。
The original version of PHP did not support object-oriented programming. Abstract classes, interfaces, destructors, and access controls for class members have since been added to the language.
PHP 允许简单地访问 HTML 表单数据,因此使用 PHP 可以轻松处理表单。PHP 支持许多不同的数据库管理系统。这使得它成为一种用于构建需要通过 Web 访问数据库的程序的有用语言。
PHP allows simple access to HTML form data, so form processing is easy with PHP. PHP provides support for many different database management systems. This makes it a useful language for building programs that need Web access to databases.
PHP 的当前版本是 7,于 2015 年发布。
The current version of PHP is 7, released in 2015.
Python ( Lutz,2013 ) 是一种面向对象的解释型脚本语言。它的最初设计者是荷兰数学中心的 Guido van Rossum,时间是 20 世纪 90 年代初。Python 软件基金会正在继续开发它。Python 的应用类型与 Perl 相同:系统管理和其他相对较小的计算任务。Python 是一个开源系统,可用于最常见的计算平台。Python 实现可在 处获得www.python.org,其中还包含有关 Python 的大量信息。
Python (Lutz, 2013) is an object-oriented interpreted scripting language. Its initial design was by Guido van Rossum at Stichting Mathematisch Centrum in the Netherlands in the early 1990s. Its development is being continued by the Python Software Foundation. Python is being used for the same kinds of applications as Perl: system administration and other relatively small computing tasks. Python is an open-source system that is available for most common computing platforms. The Python implementation is available at www.python.org, which also has extensive information regarding Python.
Python 的语法并非直接基于任何常用语言。它是经过类型检查的,但类型是动态的。Python 包含三种数据结构,而不是数组:列表、不可变列表(称为元组)和哈希(称为字典)。有一个列表方法集合,例如append、、和,以及一个字典方法集合,例如、、和。Python 还支持列表推导,它起源于Haskell语言。列表推导在第15.8节中讨论。insertremovesortkeysvaluescopyhas_key
Python’s syntax is not based directly on any commonly used language. It is type checked, but dynamically typed. Instead of arrays, Python includes three kinds of data structures: lists; immutable lists, which are called tuples; and hashes, which are called dictionaries. There is a collection of list methods, such as append, insert, remove, and sort, as well as a collection of methods for dictionaries, such as keys, values, copy, and has_key. Python also supports list comprehensions, which originated with the Haskell language. List comprehensions are discussed in Section 15.8.
Python 是面向对象的,包含 Perl 的模式匹配功能,并具有异常处理功能。垃圾收集用于回收不再需要的对象。
Python is object oriented, includes the pattern-matching capabilities of Perl, and has exception handling. Garbage collection is used to reclaim objects when they are no longer needed.
该模块提供对表单处理的支持cgi。还提供支持 cookies、网络和数据库访问的模块。
Support for form processing is provided by the cgi module. Modules that support cookies, networking, and database access are also available.
Python 支持线程并发,也支持套接字网络编程。与其他非函数式编程语言相比,它对函数式编程的支持也更多。
Python includes support for concurrency with its threads, as well as support for network programming with its sockets. It also has more support for functional programming than other nonfunctional programming languages.
Python 的一个更有趣的特性是它可由任何用户轻松扩展。支持扩展的模块可以用任何编译语言编写。扩展可以添加函数、变量和对象类型。这些扩展作为 Python 解释器的附加功能实现。
One of the more interesting features of Python is that it can be easily extended by any user. The modules that support the extensions can be written in any compiled language. Extensions can add functions, variables, and object types. These extensions are implemented as additions to the Python interpreter.
Ruby(Thomas 等,2005)由 Yukihiro Matsumoto(又名 Matz)于 20 世纪 90 年代初设计,并于 1996 年发布。自那时起,它不断发展。Ruby 的动机是其设计者对 Perl 和 Python 的不满。尽管 Perl 和 Python 都支持面向对象编程,但14 它们都不是纯粹的面向对象语言,至少在它们都具有原始(非对象)类型并且都支持函数的意义上。
Ruby (Thomas et al., 2005) was designed by Yukihiro Matsumoto (aka Matz) in the early 1990s and released in 1996. Since then it has continually evolved. The motivation for Ruby was dissatisfaction of its designer with Perl and Python. Although both Perl and Python support object-oriented programming,14 neither is a pure object-oriented language, at least in the sense that each has primitive (nonobject) types and each supports functions.
Ruby 的主要特征是它是一种纯面向对象的语言,就像 Smalltalk 一样。每个数据值都是一个对象,所有操作都是通过方法调用进行的。Ruby 中的运算符只是用于指定相应操作的方法调用的语法机制。因为它们是方法,所以可以重新定义。所有类(预定义或用户定义的)都可以被子类化。
The primary characterizing feature of Ruby is that it is a pure object-oriented language, just as is Smalltalk. Every data value is an object and all operations are via method calls. The operators in Ruby are only syntactic mechanisms to specify method calls for the corresponding operations. Because they are methods, they can be redefined. All classes, predefined or user defined, can be subclassed.
Ruby 中的类和对象都是动态的,因为可以动态地向其中添加方法。这意味着类和对象在执行期间的不同时间可以具有不同的方法集。因此,同一类的不同实例可以有不同的行为。方法、数据和常量的集合可以包含在类的定义中。
Both classes and objects in Ruby are dynamic in the sense that methods can be dynamically added to either. This means that both classes and objects can have different sets of methods at different times during execution. So, different instantiations of the same class can behave differently. Collections of methods, data, and constants can be included in the definition of a class.
Ruby 的语法与 Eiffel 和 Ada 的语法相似。无需声明变量,因为使用了动态类型。变量的作用域在其名称中指定:名称以字母开头的变量具有局部作用域;以 开头的变量@是实例变量;以 开头的变量$ 具有全局作用域。Ruby 中存在许多 Perl 的功能,包括名称很傻的隐式变量,例如$_。
The syntax of Ruby is related to that of Eiffel and Ada. There is no need to declare variables, because dynamic typing is used. The scope of a variable is specified in its name: A variable whose name begins with a letter has local scope; one that begins with @ is an instance variable; one that begins with $ has global scope. A number of features of Perl are present in Ruby, including implicit variables with silly names, such as $_.
和 Python 一样,任何用户都可以扩展和/或修改 Ruby。Ruby 在文化上很有意思,因为它是第一种在日本设计的编程语言,并在美国得到了相对广泛的应用。
As is the case with Python, any user can extend and/or modify Ruby. Ruby is culturally interesting because it is the first programming language designed in Japan that has achieved relatively widespread use in the United States.
微软于 2000 年发布了C# 和开发平台 .NET15。2002年1 月,两者的生产版本均已发布。
C#, along with the development platform .NET,15 was announced by Microsoft in 2000. In January 2002, production versions of both were released.
C# 基于 C++ 和 Java,但包含一些 Delphi 和 Visual Basic 的思想。其首席设计师 Anders Hejlsberg 还设计了 Turbo Pascal 和 Delphi,这说明 C# 的 Delphi 部分遗产。
C# is based on C++ and Java but includes some ideas from Delphi and Visual Basic. Its lead designer, Anders Hejlsberg, also designed Turbo Pascal and Delphi, which explains the Delphi parts of the heritage of C#.
C# 的目的是提供一种基于组件的软件开发语言,特别是针对 .NET Framework 中的此类开发。在这种环境中,可以轻松地将各种语言的组件组合在一起形成系统。所有 .NET 语言(包括 C#、VB.NET、Managed C++、F# 和 JScript .NET)16都使用通用类型系统 (CTS)。CTS 提供了一个通用类库。这五种 .NET 语言中的所有类型都从单个类根继承。System.Object符合 CTS 规范的编译器会创建可以组合成软件系统的对象。所有 .NET 语言都被编译成同一种中间形式,即中间语言 (IL)。17但是,与 Java 不同,IL 永远不会被解释。即时编译器用于在执行 IL 之前将其转换为机器代码。
The purpose of C# is to provide a language for component-based software development, specifically for such development in the .NET Framework. In this environment, components from a variety of languages easily can be combined to form systems. All of the .NET languages, which include C#, VB.NET, Managed C++, F#, and JScript .NET,16 use the common type system (CTS). The CTS provides a common class library. All types in all five .NET languages inherit from a single class root, System.Object. Compilers that conform to the CTS specification create objects that can be combined into software systems. All .NET languages are compiled into the same intermediate form, Intermediate Language (IL).17 Unlike Java, however, the IL is never interpreted. A Just-in-Time compiler is used to translate IL into machine code before it is executed.
许多人认为,Java 相对于 C++ 最重要的进步之一在于它排除了 C++ 的一些功能。例如,C++ 支持多重继承、指针、结构、enum类型、运算符重载和 goto 语句,但 Java 不包括这些。C# 的设计者显然不同意这种全面删除功能的做法,因为除了多重继承之外,所有这些功能都已包含在 C# 中。
Many believe that one of Java’s most important advances over C++ lies in the fact that it excludes some of C++’s features. For example, C++ supports multiple inheritance, pointers, structs, enum types, operator overloading, and a goto statement, but Java includes none of these. The designers of C# obviously disagreed with this wholesale removal of features, because all of these except multiple inheritance have been included in C#.
不过,值得赞扬的是,在一些情况下,C# 版本的 C++ 特性得到了改进。例如, C# 的枚举类型比 C++ 的更安全,因为它们永远不会被隐式转换为整数。这使它们更加类型安全。结构类型发生了显著变化,产生了一个真正有用的构造,而在 C++ 中它几乎没有用处。第12章 讨论了 C# 的结构。C# 尝试改进C、C++ 和 Java 中使用的switch语句。第8章 讨论了 C# 的 switch 。
To the credit of C#’s designers, however, in several cases, the C# version of a C++ feature has been improved. For example, the enum types of C# are safer than those of C++, because they are never implicitly converted to integers. This allows them to be more type safe. The struct type was changed significantly, resulting in a truly useful construct, whereas in C++ it serves virtually no purpose. C#’s structs are discussed in Chapter 12. C# takes a stab at improving the switch statement that is used in C, C++, and Java. C#’s switch is discussed in Chapter 8.
尽管 C++ 包含函数指针,但它们也缺乏 C++ 指向变量的指针所固有的安全性。C# 包含一种新类型委托,它是面向对象且类型安全的子程序引用。委托用于实现事件处理程序、控制线程的执行和回调。18 Java 使用接口实现回调;C++ 则使用方法指针。
Although C++ includes function pointers, they share the lack of safety that is inherent in C++’s pointers to variables. C# includes a new type, delegates, which are both object-oriented and type-safe references to subprograms. Delegates are used for implementing event handlers, controlling the execution of threads, and callbacks.18 Callbacks are implemented in Java with interfaces; in C++, method pointers are used.
在 C# 中,方法可以采用可变数量的参数,只要它们都是同一类型。这是通过使用数组类型的形式参数来指定的,前面加上params保留字。
In C#, methods can take a variable number of parameters, as long as they are all the same type. This is specified by the use of a formal parameter of array type, preceded by the params reserved word.
C++ 和 Java 都使用两种不同的类型系统:一种用于基本类型,一种用于对象。除了令人困惑之外,这还导致经常需要在两个系统之间转换值 — — 例如,将基本类型值放入存储对象的集合中。C# 通过隐式装箱和拆箱操作使两个类型系统之间的值转换部分隐式化,这些操作将在第12章 中详细讨论。19
Both C++ and Java use two distinct typing systems: one for primitives and one for objects. In addition to being confusing, this leads to a frequent need to convert values between the two systems—for example, to put a primitive value into a collection that stores objects. C# makes the conversion between values of the two typing systems partially implicit through the implicit boxing and unboxing operations, which are discussed in detail in Chapter 12.19
C# 的其他特性包括矩形数组(大多数编程语言都不支持)和foreach语句(数组和集合对象的迭代器)。Perl foreach、PHP 和 Java 5.0 中也有类似的语句。此外,C# 还包含属性,这是公共数据成员的替代方案。属性被指定为具有 get 和 set 方法的数据成员,当对相关数据成员进行引用和赋值时,会隐式调用这些方法。
Among the other features of C# are rectangular arrays, which are not supported in most programming languages, and a foreach statement, which is an iterator for arrays and collection objects. A similar foreach statement is found in Perl, PHP, and Java 5.0. Also, C# includes properties, which are an alternative to public data members. Properties are specified as data members with get and set methods, which are implicitly called when references and assignments are made to the associated data members.
C# 自 2002 年首次发布以来一直在快速发展。最新版本是 C# 7.0。C# 7.0 中的新功能是元组和一种模式匹配形式。
C# has evolved continuously and quickly from its initial release in 2002. The most recent version is C# 7.0. New in C# 7.0 are tuples and a form of pattern matching.
C# 旨在成为一种通用编程语言,比 C++ 和 Java 都更胜一筹。尽管有人认为它的某些功能是倒退的,但 C# 包含一些超越其前辈的构造。它的某些功能肯定会被未来的其他编程语言采用。
C# was meant to be an improvement over both C++ and Java as a general-purpose programming language. Although it can be argued that some of its features are a step backward, C# includes some constructs that move it beyond its predecessors. Some of its features will surely be adopted by other programming languages of the future.
以下是 C# 程序的示例:
The following is an example of a C# program:
// C# Example Program
// Input: An integer, listlen, where listlen is less than
// 100, followed by listlen-integer values.
// Output: The number of input values that are greater
// than the average of all input values.
using System;
public class Ch2example {
static void Main() {
int[] intlist;
int listlen,
counter,
sum = 0,
average,
result = 0;
intList = new int[99];
listlen = Int32.Parse(Console.readLine());
if ((listlen > 0) && (listlen < 100)) {
// Read input into an array and compute the sum
for (counter = 0; counter < listlen; counter++) {
intList[counter] =
Int32.Parse(Console.readLine());
sum += intList[counter];
} //- end of for (counter ...
// Compute the average
average = sum / listlen;
// Count the input values that are > average
foreach (int num in intList)
if (num > average) result++;
// Print result
Console.WriteLine(
"Number of values > average is:" + result);
} //- end of if ((listlen ...
else
Console.WriteLine(
"Error--input list length is not legal");
} //- end of method Main
} //- end of class Ch2example
// C# Example Program
// Input: An integer, listlen, where listlen is less than
// 100, followed by listlen-integer values.
// Output: The number of input values that are greater
// than the average of all input values.
using System;
public class Ch2example {
static void Main() {
int[] intlist;
int listlen,
counter,
sum = 0,
average,
result = 0;
intList = new int[99];
listlen = Int32.Parse(Console.readLine());
if ((listlen > 0) && (listlen < 100)) {
// Read input into an array and compute the sum
for (counter = 0; counter < listlen; counter++) {
intList[counter] =
Int32.Parse(Console.readLine());
sum += intList[counter];
} //- end of for (counter ...
// Compute the average
average = sum / listlen;
// Count the input values that are > average
foreach (int num in intList)
if (num > average) result++;
// Print result
Console.WriteLine(
"Number of values > average is:" + result);
} //- end of if ((listlen ...
else
Console.WriteLine(
"Error--input list length is not legal");
} //- end of method Main
} //- end of class Ch2example标记编程混合语言是一种标记语言,其中的一些元素可以指定编程操作,例如控制流和计算。以下小节介绍了两种这样的混合语言,XSLT 和 JSP。
A markup-programming hybrid language is a markup language in which some of the elements can specify programming actions, such as control flow and computation. The following subsections introduce two such hybrid languages, XSLT and JSP.
可扩展标记语言 (XML) 是一种元标记语言。这种语言用于定义标记语言。XML 派生的标记语言用于定义 XML 数据文档。尽管 XML 文档是人类可读的,但它们是由计算机处理的。这种处理有时仅包括转换为可以有效显示或打印的替代形式。在许多情况下,这种转换是转换为 HTML,可以通过 Web 浏览器显示。在其他情况下,文档中的数据会像处理其他形式的数据文件一样进行处理。
eXtensible markup language (XML) is a metamarkup language. Such a language is used to define markup languages. XML-derived markup languages are used to define XML data documents. Although XML documents are human readable, they are processed by computers. This processing sometimes consists only of transformations to alternative forms that can be effectively displayed or printed. In many cases, such transformations are to HTML, which can be displayed by a Web browser. In other cases, the data in the document is processed, just as with other forms of data files.
XML 文档到 HTML 文档的转换由另一种标记语言可扩展样式表语言转换 (XSLT) ( www.w3.org/TR/XSLT) 指定。XSLT 可以指定类似编程的操作。因此,XSLT 是一种标记-编程混合语言。XSLT 由万维网联盟 (W3C) 于 20 世纪 90 年代末定义。
The transformation of XML documents to HTML documents is specified in another markup language, eXtensible stylesheet language transformations (XSLT) (www.w3.org/TR/XSLT). XSLT can specify programming-like operations. Therefore, XSLT is a markup-programming hybrid language. XSLT was defined by the World Wide Web Consortium (W3C) in the late 1990s.
XSLT 处理器是一个程序,它将 XML 数据文档和 XSLT 文档(也是 XML 文档的形式)作为输入。在此处理过程中,使用 XSLT 文档中描述的转换将 XML 数据文档转换为另一个 XML 文档。XSLT文档通过定义模板来指定转换,模板是 XSLT 处理器可以在 XML 输入文件中找到的数据模式。与 XSLT 文档中的每个模板相关联的是其转换指令,这些指令指定在将匹配数据放入输出文档之前如何对其进行转换。因此,模板(及其相关处理)充当子程序,当 XSLT 处理器在 XML 文档的数据中找到模式匹配时“执行”它们。
An XSLT processor is a program that takes as input an XML data document and an XSLT document (which is also in the form of an XML document). In this processing, the XML data document is transformed to another XML document,20 using the transformations described in the XSLT document. The XSLT document specifies transformations by defining templates, which are data patterns that could be found by the XSLT processor in the XML input file. Associated with each template in the XSLT document are its transformation instructions, which specify how the matching data is to be transformed before being put in the output document. So, the templates (and their associated processing) act as subprograms, which are “executed” when the XSLT processor finds a pattern match in the data of the XML document.
XSLT 还具有较低级别的编程结构。例如,它包含一个循环结构,允许选择 XML 文档的重复部分。还有一个排序过程。这些较低级别的结构由 XSLT 标签指定,例如<for-each>。
XSLT also has programming constructs at a lower level. For example, a looping construct is included, which allows repeated parts of the XML document to be selected. There is also a sort process. These lower-level constructs are specified with XSLT tags, such as <for-each>.
Java 服务器页面标准标记库 (JSTL) 的“核心”部分是另一种标记编程混合语言,尽管它的形式和用途与 XSLT 不同。在讨论 JSTL 之前,有必要介绍一下 servlet 和 Java 服务器页面 (JSP) 的概念。servlet是驻留在 Web 服务器系统上并在其上执行的 Java 类的实例。Web 浏览器显示的标记文档请求执行 servlet。servlet 的输出以 HTML 文档的形式返回给请求浏览器。在 Web 服务器进程中运行的程序称为 servlet 容器,它控制 servlet 的执行。Servlet 通常用于表单处理和数据库访问。
The “core” part of the Java Server Pages Standard Tag Library (JSTL) is another markup-programming hybrid language, although its form and purpose are different from those of XSLT. Before discussing JSTL, it is necessary to introduce the ideas of servlets and Java Server Pages (JSP). A servlet is an instance of a Java class that resides on and is executed on a Web server system. The execution of a servlet is requested by a markup document being displayed by a Web browser. The servlet’s output, which is in the form of an HTML document, is returned to the requesting browser. A program that runs in the Web server process, called a servlet container, controls the execution of servlets. Servlets are commonly used for form processing and for database access.
JSP 是一组技术,旨在支持动态 Web 文档并提供 Web 文档的其他处理需求。当浏览器请求 JSP 文档(通常是 HTML 和 Java 的混合)时,驻留在 Web 服务器系统上的 JSP 处理器程序会将该文档转换为 servlet。文档的嵌入式 Java 代码被复制到 servlet。纯 HTML 被复制到 Java 打印语句中,并按原样输出。JSP 文档中的 JSTL 标记将被处理,如下一段所述。JSP 处理器生成的 servlet 由 servlet 容器运行。
JSP is a collection of technologies designed to support dynamic Web documents and provide other processing needs of Web documents. When a JSP document, which is often a mixture of HTML and Java, is requested by a browser, the JSP processor program, which resides on a Web server system, converts the document to a servlet. The document’s embedded Java code is copied to the servlet. The plain HTML is copied into Java print statements that output it as is. The JSTL markup in the JSP document is processed, as discussed in the following paragraph. The servlet produced by the JSP processor is run by the servlet container.
JSTL 定义了一组 XML 操作元素,用于控制 Web 服务器上 JSP 文档的处理。这些元素具有与 HTML 和 XML 的其他元素相同的形式。最常用的 JSTL 控制操作元素之一是if,它指定布尔表达式作为属性。21元素的内容if(开始标记 ( <if>) 和结束标记 ( </if>) 之间的文本)是 HTML 代码,只有当布尔表达式的计算结果为真时,这些代码才会包含在输出文档中。 元素if与 C/C++ 预处理器命令相关#if。JSP 容器处理 JSP 文档的 JSTL 部分的方式类似于 C/C++ 预处理器处理 C 和 C++ 程序的方式。预处理器命令是指示预处理器如何从输入文件构建输出文件的指令。同样,JSTL 控制操作元素是指示 JSP 处理器如何从 XML 输入文件构建 XML 输出文件的指令。
The JSTL defines a collection of XML action elements that control the processing of the JSP document on the Web server. These elements have the same form as other elements of HTML and XML. One of the most commonly used JSTL control action elements is if, which specifies a Boolean expression as an attribute.21 The content of the if element (the text between the opening tag (<if>) and its closing tag (</if>)) is HTML code that will be included in the output document only if the Boolean expression evaluates to true. The if element is related to the C/C++ #if preprocessor command. The JSP container processes the JSTL parts of JSP documents in a way that is similar to how the C/C++ preprocessor processes C and C++ programs. The preprocessor commands are instructions for the preprocessor to specify how the output file is to be constructed from the input file. Similarly, JSTL control action elements are instructions for the JSP processor to specify how to build the XML output file from the XML input file.
该if元素的一个常见用途是验证浏览器用户提交的表单数据。JSP 处理器可以访问表单数据,并可以使用该if元素对其进行测试以确保它是合理的数据。如果不是,则该if元素可以在输出文档中为用户插入一条错误消息。
One common use of the if element is for the validation of form data submitted by a browser user. Form data is accessible by the JSP processor and can be tested with the if element to ensure that it is sensible data. If not, the if element can insert an error message for the user in the output document.
对于多选控制,JSTL 具有choose、when和otherwise元素。JSTL 还包括一个forEach元素,它迭代集合,这些集合通常是来自客户端的表单值。该forEach元素可以包含begin、end和step属性来控制其迭代。
For multiple selection control, JSTL has choose, when, and otherwise elements. JSTL also includes a forEach element, which iterates over collections, which typically are form values from a client. The forEach element can include begin, end, and step attributes to control its iterations.
我们研究了许多编程语言的发展。本章为读者提供了有关语言设计当前问题的良好视角。我们为深入讨论当代语言的重要特性奠定了基础。
We have investigated the development of a number of programming languages. This chapter gives the reader a good perspective on current issues in language design. We have set the stage for an in-depth discussion of the important features of contemporary languages.
关于早期编程语言发展的历史信息,最重要的来源可能是Richard Wexelblat (1981) 编辑的《编程语言史》 。它包含了 13 种重要编程语言的发展背景和环境,由设计者自己讲述。第二次“历史”会议也发表了类似的作品,作为ACM SIGPLAN 通知( ACM, 1993a )的特刊出版。在这部作品中,讨论了另外 13 种编程语言的历史和演变。
Perhaps the most important source of historical information about the development of early programming languages is History of Programming Languages, edited by Richard Wexelblat (1981). It contains the developmental background and environment of 13 important programming languages, as told by the designers themselves. A similar work resulted from a second “history” conference, published as a special issue of ACM SIGPLAN Notices (ACM, 1993a). In this work, the history and evolution of 13 more programming languages are discussed.
论文“编程语言的早期发展”(Knuth 和 Pardo,1977 年)是《计算机科学与技术百科全书》的一部分,是一篇长达 85 页的优秀作品,详细介绍了 Fortran 等语言的发展。论文中包括示例程序,以演示其中许多语言的功能。
The paper “Early Development of Programming Languages” (Knuth and Pardo, 1977), which is part of the Encyclopedia of Computer Science and Technology, is an excellent 85-page work that details the development of languages up to and including Fortran. The paper includes example programs to demonstrate the features of many of those languages.
另一本非常有趣的书是Jean Sammet的《编程语言:历史和基础》 (1969 年) 。这本书长达 785 页,详细介绍了 20 世纪 50 年代和 60 年代的 80 种编程语言。Sammet 还出版了她的书的几本更新版,例如《1974-75 年编程语言名录》(1976 年)。
Another book of great interest is Programming Languages: History and Fundamentals, by Jean Sammet (1969). It is a 785-page work filled with details of 80 programming languages of the 1950s and 1960s. Sammet has also published several updates to her book, such as Roster of Programming Languages for 1974–75 (1976).
Plankalkül 是在哪一年设计的?该设计是在哪一年发布的?
In what year was Plankalkül designed? In what year was that design published?
Plankalkül 包含哪两种常见的数据结构?
What two common data structures were included in Plankalkül?
20 世纪 50 年代早期的伪代码是如何实现的?
How were the pseudocodes of the early 1950s implemented?
速度编码的发明是为了克服 20 世纪 50 年代早期计算机硬件的两个重大缺陷。它们是什么?
Speedcoding was invented to overcome two significant shortcomings of the computer hardware of the early 1950s. What were they?
为什么在 20 世纪 50 年代初期程序解释的缓慢速度是可以接受的?
Why was the slowness of interpretation of programs acceptable in the early 1950s?
IBM 704 计算机中首次出现的哪种硬件功能对编程语言的发展产生了重大影响?解释原因。
What hardware capability that first appeared in the IBM 704 computer strongly affected the evolution of programming languages? Explain why.
Fortran 设计项目于哪一年开始?
In what year was the Fortran design project begun?
Fortran 设计时计算机的主要应用领域是什么?
What was the primary application area of computers at the time Fortran was designed?
Fortran I 的所有控制流语句的来源是什么?
What was the source of all of the control flow statements of Fortran I?
Fortran I 中添加了什么最重要的特性从而得到了 Fortran II?
What was the most significant feature added to Fortran I to get Fortran II?
哪些控制流语句被添加到 Fortran IV 中以获得 Fortran 77?
What control flow statements were added to Fortran IV to get Fortran 77?
哪个版本的 Fortran 第一个具有任何类型的动态变量?
Which version of Fortran was the first to have any sort of dynamic variables?
哪个版本的 Fortran 第一个具有字符串处理功能?
Which version of Fortran was the first to have character string handling?
为什么 20 世纪 50 年代末语言学家对人工智能感兴趣?
Why were linguists interested in artificial intelligence in the late 1950s?
Lisp 是在哪里开发的?由谁开发的?
Where was Lisp developed? By whom?
Scheme 和 Common Lisp 有哪些对立之处?
In what way are Scheme and Common Lisp opposites of each other?
一些大学的入门编程课程使用哪种 Lisp 方言?
What dialect of Lisp is used for introductory programming courses at some universities?
哪两个专业组织共同设计了 ALGOL 60?
What two professional organizations together designed ALGOL 60?
块结构出现在哪个版本的 ALGOL 中?
In what version of ALGOL did block structure appear?
ALGOL 60 缺少哪些语言元素从而损害了其广泛使用的机会?
What missing language element of ALGOL 60 damaged its chances for widespread use?
什么语言被设计来描述 ALGOL 60 的语法?
What language was designed to describe the syntax of ALGOL 60?
COBOL 基于哪种编程语言?
On what programming language was COBOL based?
COBOL 设计过程始于哪一年?
In what year did the COBOL design process begin?
COBOL 中出现的哪种数据结构源自 Plankalkül?
What data structure that appeared in COBOL originated with Plankalkül?
哪个组织对 COBOL 的早期成功负有最大责任(就使用范围而言)?
What organization was most responsible for the early success of COBOL (in terms of extent of use)?
Basic 第一版的目标用户群是哪些?
What user group was the target of the first version of Basic?
为什么 Basic 在 20 世纪 80 年代初期是一种重要的语言?
Why was Basic an important language in the early 1980s?
PL/I 被设计用于取代哪两种语言?
PL/I was designed to replace what two languages?
PL/I 是为哪款新系列计算机设计的?
For what new line of computers was PL/I designed?
SIMULA 67 的哪些特性现在是一些面向对象语言的重要组成部分?
What features of SIMULA 67 are now important parts of some object-oriented languages?
ALGOL 68 中引入了哪种数据结构创新,但通常归功于 Pascal?
What innovation of data structuring was introduced in ALGOL 68 but is often credited to Pascal?
ALGOL 68 中广泛使用了哪种设计标准?
What design criterion was used extensively in ALGOL 68?
什么语言引入了该case声明?
What language introduced the case statement?
C 语言中的哪些运算符是以 ALGOL 68 中的类似运算符为模型的?
What operators in C were modeled on similar operators in ALGOL 68?
C 的哪两个特点使其安全性不如 Pascal?
What are two characteristics of C that make it less safe than Pascal?
什么是非过程语言?
What is a nonprocedural language?
填充 Prolog 数据库的两种语句是什么?
What are the two kinds of statements that populate a Prolog database?
Ada 主要针对哪个应用领域设计?
What is the primary application area for which Ada was designed?
Ada 的并发程序单元叫什么?
What are the concurrent program units of Ada called?
哪种 Ada 结构提供对抽象数据类型的支持?
What Ada construct provides support for abstract data types?
Smalltalk 世界中有哪些东西?
What populates the Smalltalk world?
面向对象编程的基础是哪三个概念?
What three concepts are the basis for object-oriented programming?
为什么 C++ 包含已知不安全的 C 特性?
Why does C++ include the features of C that are known to be unsafe?
Swift 是为了取代哪种语言而设计的?
What language was Swift designed to replace?
Ada 和 COBOL 语言有哪些共同点?
What do the Ada and COBOL languages have in common?
Java 的第一个应用程序是什么?
What was the first application for Java?
Java 的哪个特性在 JavaScript 中表现得最明显?
What characteristic of Java is most evident in JavaScript?
PHP 和 JavaScript 的类型系统与 Java 的类型系统有何不同?
How does the typing system of PHP and JavaScript differ from that of Java?
哪种数组结构包含在 C# 中,但不包含在 C、C++ 或 Java 中?
What array structure is included in C# but not in C, C++, or Java?
Perl 的原始版本旨在取代哪两种语言?
What two languages was the original version of Perl meant to replace?
JavaScript 最广泛的应用领域是什么?
For what application area is JavaScript most widely used?
从使用角度来看,JavaScript 和 PHP 之间有什么关系?
What is the relationship between JavaScript and PHP, in terms of their use?
PHP 的主要数据结构是其他语言中哪两种数据结构的组合?
PHP’s primary data structure is a combination of what two data structures from other languages?
Python 使用什么数据结构代替数组?
What data structure does Python use in place of arrays?
Ruby 与 Smalltalk 有哪些共同特点?
What characteristic does Ruby share with Smalltalk?
Ruby 的算术运算符有哪些特点使得它们与其他语言的算术运算符不同?
What characteristic of Ruby’s arithmetic operators makes them unique among those of other languages?
switchC# 对 C 语句所做的修改解决了该语句的哪些缺陷?
What deficiency of the switch statement of C is addressed with the changes made by C# to that statement?
C# 主要在哪个平台上使用?
What is the primary platform on which C# is used?
XSLT 处理器的输入是什么?
What are the inputs to an XSLT processor?
XSLT 处理器的输出是什么?
What is the output of an XSLT processor?
JSTL 的哪个元素与子程序相关?
What element of the JSTL is related to a subprogram?
JSP 处理器将 JSP 文档转换为什么?
To what is a JSP document converted by a JSP processor?
servlet 在哪里执行?
Where are servlets executed?
如果 Fortran 设计人员熟悉 Plankalkül,您认为 Plankalkül 的哪些特性会对 Fortran 0 产生最大影响?
What features of Plankalkül do you think would have had the greatest influence on Fortran 0 if the Fortran designers had been familiar with Plankalkül?
确定 Backus 701 Speedcoding 系统的功能,并将其与当代可编程手持计算器的功能进行比较。
Determine the capabilities of Backus’s 701 Speedcoding system, and compare them with those of a contemporary programmable hand calculator.
简述 Grace Hopper 和她的同事设计的 A-0、A-1 和 A-2 系统的历史。
Write a short history of the A-0, A-1, and A-2 systems designed by Grace Hopper and her associates.
将 Fortran 0 的功能与 Laning 和 Zierler 系统的功能进行比较。
Compare the facilities of Fortran 0 with those of the Laning and Zierler system.
您认为 ALGOL 设计委员会最初的三个目标中哪一个在当时最难实现?
Which of the three original goals of the ALGOL design committee, in your opinion, was most difficult to achieve at that time?
对 Lisp 程序中最常见的语法错误做出有根据的猜测。
Make an educated guess as to the most common syntax error in Lisp programs.
Lisp 最初是一种纯函数式语言,但逐渐获得了越来越多的命令式特性。这是为什么呢?
Lisp began as a pure functional language but gradually acquired more and more imperative features. Why?
详细描述您认为 ALGOL 60 未能成为一种广泛使用的语言的三个最重要的原因。
Describe in detail the three most important reasons, in your opinion, why ALGOL 60 did not become a very widely used language.
您认为为什么 COBOL 允许使用长标识符而 Fortran 和 ALGOL 却不允许?
Why, in your opinion, did COBOL allow long identifiers when Fortran and ALGOL did not?
概述 IBM 开发 PL/I 的主要动机。
Outline the major motivation of IBM in developing PL/I.
考虑到 1964 年以来计算机和语言发展的历史,IBM 开发 PL/I 的决定所依据的假设是否正确?
Was IBM’s assumption, on which it based its decision to develop PL/I, correct, given the history of computers and language developments since 1964?
用你自己的话描述编程语言设计中的正交性概念。
Describe, in your own words, the concept of orthogonality in programming language design.
PL/I 比 ALGOL 68 更广泛使用的主要原因是什么?
What is the primary reason why PL/I became more widely used than ALGOL 68?
支持和反对无类型语言观点的论据是什么?
What are the arguments both for and against the idea of a typeless language?
除了 Prolog 之外还有其他逻辑编程语言吗?
Are there any logic programming languages other than Prolog?
您对以下观点有何看法:过于复杂的语言使用起来太危险,因此我们应使所有语言都保持小巧和简单?
What is your opinion of the argument that languages that are too complex are too dangerous to use, and we should therefore keep all languages small and simple?
您认为由委员会进行语言设计是个好主意吗?支持您的观点。
Do you think language design by committee is a good idea? Support your opinion.
语言在不断发展。您认为编程语言的改变应受到哪些限制?将您的答案与 Fortran 的发展进行比较。
Languages continually evolve. What sort of restrictions do you think are appropriate for changes in programming languages? Compare your answers with the evolution of Fortran.
建立一个表格,标明所有主要的语言发展,以及它们发生的时间、首次出现的语言以及开发者的身份。
Build a table identifying all of the major language developments, together with when they occurred, in what language they first appeared, and the identities of the developers.
微软和 Sun 之间曾就微软的 J++ 和 C# 以及 Sun 的 Java 的设计进行过一些公开的交流。请阅读这些文档(可在各自的网站上找到),并撰写一份有关代表之间分歧的分析。
There have been some public interchanges between Microsoft and Sun concerning the design of Microsoft’s J++ and C# and Sun’s Java. Read some of these documents, which are available on their respective Web sites, and write an analysis of the disagreements concerning the delegates.
近年来,数据结构在脚本语言中不断发展,取代了传统的数组。请解释这些发展的时间顺序。
In recent years data structures have evolved within scripting languages to replace traditional arrays. Explain the chronological sequence of these developments.
解释为什么纯解释是最近几种脚本语言可接受的实现方法的两个原因。
Explain two reasons why pure interpretation is an acceptable implementation method for several recent scripting languages.
为什么您认为新的脚本语言比新的编译语言出现得更频繁?
Why, in your opinion, do new scripting languages appear more frequently than new compiled languages?
对标记编程混合语言进行简单的概括描述。
Give a brief general description of a markup-programming hybrid language.
要了解编程语言中记录的价值,请用基于 C 的语言编写一个小程序,该程序使用结构数组来存储学生信息,包括姓名、年龄、GPA(浮点数)和年级(字符串)(例如“新生”等)。另外,用相同的语言编写相同的程序,但不使用结构。
To understand the value of records in a programming language, write a small program in a C-based language that uses an array of structs that store student information, including name, age, GPA as a float, and grade level as a string (e.g., “freshmen,” etc.). Also, write the same program in the same language without using structs.
为了理解编程语言中递归的价值,请编写一个实现快速排序的程序,首先使用递归,然后不使用递归。
To understand the value of recursion in a programming language, write a program that implements quicksort, first using recursion and then without recursion.
要理解计数循环的价值,请编写一个使用计数循环结构实现矩阵乘法的程序。然后仅使用逻辑循环(例如while循环)编写相同的程序。
To understand the value of counting loops, write a program that implements matrix multiplication using counting loop constructs. Then write the same program using only logical loops—for example, while loops.
本章首先定义语法和语义这两个术语。然后,详细讨论描述语法的最常用方法——上下文无关文法(也称为巴科斯范式)。讨论的内容包括派生、解析树、歧义性、运算符优先级和结合性的描述以及扩展巴科斯范式。接下来讨论属性语法,它可用于描述编程语言的语法和静态语义。在最后一节中,介绍了描述语义的三种形式化方法——操作语义、公理语义和外延语义。由于语义描述方法本身的复杂性,我们对它们的讨论很简短。仅就其中一种就可以轻松地写出一整本书(正如几位作者所做的那样)。
This chapter begins by defining the terms syntax and semantics. Then, a detailed discussion of the most common method of describing syntax, context-free grammars (also known as Backus-Naur Form), is presented. Included in this discussion are derivations, parse trees, ambiguity, descriptions of operator precedence and associativity, and extended Backus-Naur Form. Attribute grammars, which can be used to describe both the syntax and static semantics of programming languages, are discussed next. In the last section, three formal methods of describing semantics—operational, axiomatic, and denotational semantics—are introduced. Because of the inherent complexity of the semantics description methods, our discussion of them is brief. One could easily write an entire book on just one of the three (as several authors have).
提供简明易懂的编程语言描述是一项艰巨的任务,但对于语言的成功却至关重要。ALGOL 60 和 ALGOL 68 最初都是使用简明的正式描述提出的;然而,这两种情况下的描述都不容易理解,部分原因是它们都使用了新的符号。因此,这两种语言的接受度都受到了影响。另一方面,一些语言存在许多略有不同的方言的问题,这是由于定义简单但非正式且不精确造成的。
The task of providing a concise yet understandable description of a programming language is difficult but essential to the language’s success. ALGOL 60 and ALGOL 68 were first presented using concise formal descriptions; in both cases, however, the descriptions were not easily understandable, partly because each used a new notation. The levels of acceptance of both languages suffered as a result. On the other hand, some languages have suffered the problem of having many slightly different dialects, a result of a simple but informal and imprecise definition.
描述一种语言时遇到的问题之一是必须理解该描述的人的多样性。其中包括初始评估者、实现者和用户。大多数新的编程语言在设计完成之前都要经过一段时期的潜在用户的审查,这些潜在用户通常是雇用该语言设计者的组织内部的人员。这些人就是初始评估者。这个反馈周期的成功在很大程度上取决于描述的清晰度。
One of the problems in describing a language is the diversity of the people who must understand the description. Among these are initial evaluators, implementors, and users. Most new programming languages are subjected to a period of scrutiny by potential users, often people within the organization that employs the language’s designer, before their designs are completed. These are the initial evaluators. The success of this feedback cycle depends heavily on the clarity of the description.
编程语言实现者显然必须能够确定语言的表达式、语句和程序单元是如何形成的,以及它们执行时的预期效果。实现者工作的难度部分取决于语言描述的完整性和精确性。
Programming language implementors obviously must be able to determine how the expressions, statements, and program units of a language are formed, and also their intended effect when executed. The difficulty of the implementors’ job is, in part, determined by the completeness and precision of the language description.
最后,语言用户必须能够通过参考语言参考手册来确定如何编码软件解决方案。教科书和课程都参与了这一过程,但语言手册通常是有关语言的唯一权威印刷信息来源。
Finally, language users must be able to determine how to encode software solutions by referring to a language reference manual. Textbooks and courses enter into this process, but language manuals are usually the only authoritative printed information source about a language.
编程语言的研究与自然语言的研究一样,可以分为语法和语义的考察。编程语言的语法是其表达式、语句和程序单元的形式。其语义是这些表达式、语句和程序单元的含义。例如,Java 语句的语法while是
The study of programming languages, like the study of natural languages, can be divided into examinations of syntax and semantics. The syntax of a programming language is the form of its expressions, statements, and program units. Its semantics is the meaning of those expressions, statements, and program units. For example, the syntax of a Java while statement is
while(boolean_expr)语句
while (boolean_expr) statement
这种语句形式的语义是,当布尔表达式的当前值为真时,嵌入的语句将被执行。然后控制权隐式返回到布尔表达式以重复该过程。如果布尔表达式为假,则控制权转移到构造后面的语句while。
The semantics of this statement form is that when the current value of the Boolean expression is true, the embedded statement is executed. Then control implicitly returns to the Boolean expression to repeat the process. If the Boolean expression is false, control transfers to the statement following the while construct.
尽管为了便于讨论,语法和语义经常被分开,但它们是紧密相关的。在设计良好的编程语言中,语义应该直接遵循语法;也就是说,语句的出现应该强烈暗示该语句要完成什么任务。
Although they are often separated for discussion purposes, syntax and semantics are closely related. In a well-designed programming language, semantics should follow directly from syntax; that is, the appearance of a statement should strongly suggest what the statement is meant to accomplish.
描述语法比描述语义更容易,部分原因是对于语法描述已经有了简洁且普遍接受的符号,但是对于语义还没有开发出任何符号。
Describing syntax is easier than describing semantics, partly because a concise and universally accepted notation is available for syntax description, but none has yet been developed for semantics.
语言,无论是自然语言(如英语)还是人工语言(如 Java),都是一组来自某个字母表的字符串。语言的字符串称为句子或语句。语言的语法规则指定语言中哪些字符串来自该语言的字母表。例如,英语有一套庞大而复杂的规则来指定其句子的语法。相比之下,即使是最大和最复杂的编程语言在语法上也非常简单。
A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. The strings of a language are called sentences or statements. The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. English, for example, has a large and complex collection of rules for specifying the syntax of its sentences. By comparison, even the largest and most complex programming languages are syntactically very simple.
为简单起见,编程语言语法的正式描述通常不包括最低级语法单元的描述。这些小单元称为词素。词素的描述可以通过词汇规范给出,词汇规范通常与语言的语法描述分开。编程语言的词素包括其数字文字、运算符和特殊词等。可以将程序视为词素的字符串,而不是字符的字符串。
Formal descriptions of the syntax of programming languages, for simplicity’s sake, often do not include descriptions of the lowest-level syntactic units. These small units are called lexemes. The description of lexemes can be given by a lexical specification, which is usually separate from the syntactic description of the language. The lexemes of a programming language include its numeric literals, operators, and special words, among others. One can think of programs as strings of lexemes rather than of characters.
词素被分成几组——例如,编程语言中的变量、方法、类等的名称形成一个称为标识符的组。每个词素组都由一个名称或标记表示。因此,语言的标记sum是其词素的一个类别。例如,标识符是一个可以有词素或实例的标记,例如和total。在某些情况下,一个标记只有一个可能的词素。例如,算术运算符符号的标记+只有一个可能的词素。考虑以下 Java 语句:
Lexemes are partitioned into groups—for example, the names of variables, methods, classes, and so forth in a programming language form a group called identifiers. Each lexeme group is represented by a name, or token. So, a token of a language is a category of its lexemes. For example, an identifier is a token that can have lexemes, or instances, such as sum and total. In some cases, a token has only a single possible lexeme. For example, the token for the arithmetic operator symbol + has just one possible lexeme. Consider the following Java statement:
index = 2 * count + 17;index = 2 * count + 17;
此语句的词素和标记是
The lexemes and tokens of this statement are
本章中的示例语言描述非常简单,大多数都包含词素描述。
The example language descriptions in this chapter are very simple, and most include lexeme descriptions.
一般来说,语言可以用两种不同的方式正式定义:通过识别和通过生成(尽管这两种方式都没有为试图学习或使用编程语言的人提供实用的定义)。假设我们有一种使用字母表的语言 L 字符。要使用识别方法正式定义 L,我们需要构建一个机制 R,称为识别设备,能够读取字母表中的字符串 R 会指出给定的输入字符串是否在 L 中。实际上,R 会接受或拒绝给定的字符串。这些设备就像过滤器,将合法的句子与错误形成的句子区分开来。如果 R 在输入超过 仅当它在 L 中时才接受它,那么 R 就是 L 的描述。因为大多数有用的语言在实际应用中都是无限的,所以这似乎是一个漫长而无效的过程。然而,识别设备并不是用来列举一种语言的所有句子的——它们有不同的用途。
In general, languages can be formally defined in two distinct ways: by recognition and by generation (although neither provides a definition that is practical by itself for people trying to learn or use a programming language). Suppose we have a language L that uses an alphabet of characters. To define L formally using the recognition method, we would need to construct a mechanism R, called a recognition device, capable of reading strings of characters from the alphabet R would indicate whether a given input string was or was not in L. In effect, R would either accept or reject the given string. Such devices are like filters, separating legal sentences from those that are incorrectly formed. If R, when fed any string of characters over accepts it only if it is in L, then R is a description of L. Because most useful languages are, for all practical purposes, infinite, this might seem like a lengthy and ineffective process. Recognition devices, however, are not used to enumerate all of the sentences of a language—they have a different purpose.
编译器的语法分析部分是编译器所翻译语言的识别器。在此角色中,识别器不需要测试某个集合中的所有可能的字符串来确定每个字符串是否属于该语言。相反,它只需要确定给定的程序是否属于该语言。实际上,语法分析器确定给定的程序在语法上是否正确。语法分析器(也称为解析器)的结构将在第4章 中讨论。
The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this role, the recognizer need not test all possible strings of characters from some set to determine whether each is in the language. Rather, it need only determine whether given programs are in the language. In effect then, the syntax analyzer determines whether the given programs are syntactically correct. The structure of syntax analyzers, also known as parsers, is discussed in Chapter 4.
语言生成器是一种可用于生成语言句子的设备。我们可以将生成器想象成一个按钮,每次按下按钮都会产生一个语言句子。由于生成器按下按钮时产生的特定句子是不可预测的,因此生成器作为语言描述器似乎是一种用途有限的设备。然而,人们更喜欢某些形式的生成器而不是识别器,因为他们可以更容易地阅读和理解它们。相比之下,编译器(语言识别器)的语法检查部分对程序员来说并不是那么有用的语言描述,因为它只能在试错模式下使用。例如,要使用编译器确定特定语句的正确语法,程序员只能提交推测版本并注意编译器是否接受它。另一方面,通常可以通过将特定语句与生成器的结构进行比较来确定其语法是否正确。
A language generator is a device that can be used to generate the sentences of a language. We can think of the generator as having a button that produces a sentence of the language every time it is pushed. Because the particular sentence that is produced by a generator when its button is pushed is unpredictable, a generator seems to be a device of limited usefulness as a language descriptor. However, people prefer certain forms of generators over recognizers because they can more easily read and understand them. By contrast, the syntax-checking portion of a compiler (a language recognizer) is not as useful a language description for a programmer because it can be used only in trial-and-error mode. For example, to determine the correct syntax of a particular statement using a compiler, the programmer can only submit a speculated version and note whether the compiler accepts it. On the other hand, it is often possible to determine whether the syntax of a particular statement is correct by comparing it with the structure of the generator.
同一种语言的形式生成和识别设备之间存在着密切的联系。这是计算机科学领域的开创性发现之一,它导致了现在人们对形式语言和编译器设计理论的了解。下一节我们将回到生成器和识别器的关系。
There is a close connection between formal generation and recognition devices for the same language. This was one of the seminal discoveries in computer science, and it led to much of what is now known about formal languages and compiler design theory. We return to the relationship of generators and recognizers in the next section.
本节讨论正式的语言生成机制,通常称为语法,常用于描述编程语言的语法。
This section discusses the formal language-generation mechanisms, usually called grammars, that are commonly used to describe the syntax of programming languages.
20 世纪 50 年代中后期,诺姆·乔姆斯基 (Noam Chomsky) 和约翰·巴克斯 (John Backus) 两个人在不相关的研究工作中开发了相同的语法描述形式主义,后来成为编程语言语法最广泛使用的方法。
In the middle to late 1950s, two men, Noam Chomsky and John Backus, in unrelated research efforts, developed the same syntax description formalism, which subsequently became the most widely used method for programming language syntax.
20 世纪 50 年代中期,著名语言学家诺姆·乔姆斯基 (Noam Chomsky) 描述了四类生成设备或语法,它们定义了四类语言 ( Chomsky, 1956 , 1959 )。其中两类语法,即上下文无关语法和常规语法,被证明对于描述编程语言的语法很有用。编程语言的标记形式可以用常规语法来描述。除了少数例外,整个编程语言的语法都可以用上下文无关语法来描述。因为乔姆斯基是一名语言学家,所以他的主要兴趣是自然语言的理论性质。他当时对用于与计算机通信的人工语言不感兴趣。所以直到后来他的工作才被应用于编程语言。
In the mid-1950s, Noam Chomsky, a noted linguist (among other things), described four classes of generative devices or grammars that define four classes of languages (Chomsky, 1956, 1959). Two of these grammar classes, named context-free and regular, turned out to be useful for describing the syntax of programming languages. The forms of the tokens of programming languages can be described by regular grammars. The syntax of whole programming languages, with minor exceptions, can be described by context-free grammars. Because Chomsky was a linguist, his primary interest was the theoretical nature of natural languages. He had no interest at the time in the artificial languages used to communicate with computers. So it was not until later that his work was applied to programming languages.
在乔姆斯基完成语言类的工作后不久,ACM-GAMM 小组开始设计 ALGOL 58。ACM-GAMM 小组的杰出成员 John Backus 在 1959 年的一次国际会议上发表了一篇描述 ALGOL 58 的里程碑式论文 ( Backus, 1959 )。这篇论文引入了一种用于指定编程语言语法的新形式符号。后来,Peter Naur 对新符号进行了少许修改,用于描述 ALGOL 60 ( Naur, 1960 )。这种修改后的语法描述方法被称为Backus-Naur 形式,或简称为BNF。
Shortly after Chomsky’s work on language classes, the ACM-GAMM group began designing ALGOL 58. A landmark paper describing ALGOL 58 was presented by John Backus, a prominent member of the ACM-GAMM group, at an international conference in 1959 (Backus, 1959). This paper introduced a new formal notation for specifying programming language syntax. The new notation was later modified slightly by Peter Naur for the description of ALGOL 60 (Naur, 1960). This revised method of syntax description became known as Backus-Naur Form, or simply BNF.
BNF 是一种描述语法的自然符号。事实上,公元前几百年,帕尼尼就用类似 BNF 的符号来描述梵语的语法(Ingerman,1967)。
BNF is a natural notation for describing syntax. In fact, something similar to BNF was used by Panini to describe the syntax of Sanskrit several hundred years before Christ (Ingerman, 1967).
尽管 ALGOL 60 报告中使用的 BNF 并未立即被计算机用户接受,但它很快就成为并且至今仍是最流行的简洁描述编程语言语法的方法。
Although the use of BNF in the ALGOL 60 report was not immediately accepted by computer users, it soon became and is still the most popular method of concisely describing programming language syntax.
值得注意的是,BNF 与乔姆斯基的上下文无关语言生成工具(称为上下文无关语法)几乎完全相同。在本章的其余部分,我们将上下文无关语法简称为语法。此外,BNF 和语法这两个术语可互换使用。
It is remarkable that BNF is nearly identical to Chomsky’s generative devices for context-free languages, called context-free grammars. In the remainder of the chapter, we refer to context-free grammars simply as grammars. Furthermore, the terms BNF and grammar are used interchangeably.
元语言是一种用来描述另一种语言的语言。BNF 是一种编程语言的元语言。
A metalanguage is a language that is used to describe another language. BNF is a metalanguage for programming languages.
BNF 使用抽象来表示句法结构。例如,一个简单的 Java 赋值语句可以用抽象 <assign> 来表示(尖括号通常用于分隔抽象的名称)。<assign> 的实际定义可以这样给出
BNF uses abstractions for syntactic structures. A simple Java assignment statement, for example, might be represented by the abstraction <assign> (pointed brackets are often used to delimit names of abstractions). The actual definition of <assign> can be given by
<分配>
<变量> =<表达式>
<assign>
<var> = <expression>
箭头左侧的文本被恰当地称为左侧( LHS),是正在定义的抽象。箭头右侧的文本是 LHS 的定义。它被称为右侧( RHS),由一些标记、词素和对其他抽象的引用组成。(实际上,标记也是抽象。)总之,定义称为规则或产生式。在刚刚给出的示例规则中,显然必须定义抽象 <var> 和 <expression>,才能使 <assign> 定义有用。
The text on the left side of the arrow, which is aptly called the left-hand side (LHS), is the abstraction being defined. The text to the right of the arrow is the definition of the LHS. It is called the right-hand side (RHS) and consists of some mixture of tokens, lexemes, and references to other abstractions. (Actually, tokens are also abstractions.) Altogether, the definition is called a rule, or production. In the example rule just given, the abstractions <var> and <expression> obviously must be defined for the <assign> definition to be useful.
此特定规则指定抽象 <assign> 被定义为抽象 <var> 的一个实例,后跟词素=,后跟抽象 <expression> 的一个实例。规则描述的句法结构的一个例句是
This particular rule specifies that the abstraction <assign> is defined as an instance of the abstraction <var>, followed by the lexeme =, followed by an instance of the abstraction <expression>. One example sentence whose syntactic structure is described by the rule is
total = subtotal1 + subtotal2total = subtotal1 + subtotal2
BNF 描述或语法中的抽象通常称为非终结符号或简称为非终结符,而规则的词素和标记称为终结符号或简称为终结符。BNF 描述或语法是规则的集合。
The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply nonterminals, and the lexemes and tokens of the rules are called terminal symbols, or simply terminals. A BNF description, or grammar, is a collection of rules.
非终结符可以有两个或多个不同的定义,代表语言中两个或多个可能的句法形式。多个定义可以写成一条规则,不同的定义之间用空格隔开符号|表示逻辑或。例如,Javaif语句可以用以下规则描述
Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language. Multiple definitions can be written as a single rule, with the different definitions separated by the symbol |, meaning logical OR. For example, a Java if statement can be described with the rules
<if_stmt>
if (<逻辑表达式> )<语句>
<如果语句>
if (<逻辑表达式)> <语句> else<语句>
<if_stmt>
if ( <logic_expr> ) <stmt>
<if_stmt>
if ( <logic_expr> ) <stmt> else <stmt>
或使用规则
or with the rule
<if_stmt>
if (<逻辑表达式)> <语句> <逻辑表达式> <语句> <语句> | if ()else
<if_stmt>
if ( <logic_expr> ) <stmt> | if ( <logic_expr> ) <stmt> else <stmt>
在这些规则中,<stmt> 表示单个语句或复合语句。
In these rules, <stmt> represents either a single statement or a compound statement.
尽管 BNF 很简单,但它足以描述几乎所有编程语言的语法。特别是,它可以描述相似构造的列表、不同构造必须出现的顺序以及任意深度的嵌套结构,甚至暗示运算符优先级和运算符结合性。
Although BNF is simple, it is sufficiently powerful to describe nearly all of the syntax of programming languages. In particular, it can describe lists of similar constructs, the order in which different constructs must appear, and nested structures to any depth, and even imply operator precedence and operator associativity.
数学中可变长度列表通常使用省略号 (...) 来表示;例如,1、2、...。BNF 不包含省略号,因此需要一种替代方法来描述编程语言中的语法元素列表(例如,出现在数据声明语句中的标识符列表)。对于 BNF,替代方法是递归。如果规则的 LHS 出现在其 RHS 中,则该规则是递归的。以下规则说明了如何使用递归来描述列表:
Variable-length lists in mathematics are often written using an ellipsis ( … ); 1, 2, … is an example. BNF does not include the ellipsis, so an alternative method is required for describing lists of syntactic elements in programming languages (for example, a list of identifiers appearing on a data declaration statement). For BNF, the alternative is recursion. A rule is recursive if its LHS appears in its RHS. The following rules illustrate how recursion is used to describe lists:
<标识列表>
标识符 | 标识符,<ident_list>
<ident_list>
identifier | identifier, <ident_list>
这将 <ident_list> 定义为单个标记(标识符)或标识符后跟逗号和另一个 <ident_list> 实例。本章其余部分的许多示例语法都使用递归来描述列表。
This defines <ident_list> as either a single token (identifier) or an identifier followed by a comma and another instance of <ident_list>. Recursion is used to describe lists in many of the example grammars in the remainder of this chapter.
语法是定义语言的生成工具。语言的句子是通过一系列规则的应用生成的,从语法中一个特殊的非终结符(称为起始符号)开始。这个规则应用序列称为派生。在完整编程语言的语法中,起始符号代表一个完整的程序,通常命名为 <program>。示例 3.1中显示的简单语法用于说明派生。
A grammar is a generative device for defining languages. The sentences of the language are generated through a sequence of applications of the rules, beginning with a special nonterminal of the grammar called the start symbol. This sequence of rule applications is called a derivation. In a grammar for a complete programming language, the start symbol represents a complete program and is often named <program>. The simple grammar shown in Example 3.1 is used to illustrate derivations.
<程序> → begin<stmt_list> end
<stmt_list> → <stmt> |<stmt> ;<stmt_list>
<stmt> → <var> =<表达式>
<var> → A | B | C
<表达式> → <var> +<var> |<var> -<var> |<变量>
<program> → begin <stmt_list> end
<stmt_list> → <stmt> | <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var> | <var> - <var> | <var>
示例 3.1的语法描述的语言只有一种语句形式:赋值。程序由特殊字 组成begin,后跟用分号分隔的语句列表,后跟特殊字end。表达式是单个变量或由+或-运算符分隔的两个变量。此语言中唯一的变量名是A、B和C。
The language described by the grammar of Example 3.1 has only one statement form: assignment. A program consists of the special word begin, followed by a list of statements separated by semicolons, followed by the special word end. An expression is either a single variable or two variables separated by either a + or - operator. The only variable names in this language are A, B, and C.
该语言程序的推导如下:
A derivation of a program in this language follows:
<程序> <语句列表> <语句> <语句列表> <变量> <表达式> <语句列表> <表达式> <语句列表> <变量> <语句列表> <变量> <语句列表> <变量> <语句列表> <语句列表> <语句> <变量> <表达式> <表达式> <变量>=> beginend
=> begin;end
=> begin=;end
=> begin A =;end
=> begin A =+;end
=> begin A = B +;end
=> begin A = B + C ;end
=> begin A = B + ;end
=> begin A = B + C ;=end
=> begin A = B + C ; B =end
=> begin A = B + C ; B =end
=> begin A = B + C ; B = C end
<program> => begin <stmt_list> end
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression> ; <stmt_list> end
=> begin A = <expression> ; <stmt_list> end
=> begin A = <var> + <var> ; <stmt_list> end
=> begin A = B + <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
=> begin A = B + ; <stmt> end
=> begin A = B + C ; <var> = <expression> end
=> begin A = B + C ; B = <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
与所有派生一样,此派生以起始符号开始,在本例中为 <program>。符号=>读作“derives”。序列中的每个连续字符串都是通过用非终结符之一的定义替换前一个字符串而派生出来的。派生中的每个字符串(包括 <program>)都称为句型。
This derivation, like all derivations, begins with the start symbol, in this case <program>. The symbol => is read “derives.” Each successive string in the sequence is derived from the previous string by replacing one of the nonterminals with one of that nonterminal’s definitions. Each of the strings in the derivation, including <program>, is called a sentential form.
在此推导中,被替换的非终结符始终是前一个句子形式中最左边的非终结符。使用此替换顺序的推导称为最左推导。推导持续进行,直到句子形式不包含任何非终结符。该句子形式仅由终结符或词素组成,即生成的句子。
In this derivation, the replaced nonterminal is always the leftmost nonterminal in the previous sentential form. Derivations that use this order of replacement are called leftmost derivations. The derivation continues until the sentential form contains no nonterminals. That sentential form, consisting of only terminals, or lexemes, is the generated sentence.
除了最左派以外,推导还可以是最右派,或者既不是最左也不是最右的顺序。推导顺序对文法生成的语言没有影响。
In addition to leftmost, a derivation may be rightmost or in an order that is neither leftmost nor rightmost. Derivation order has no effect on the language generated by a grammar.
通过选择替代规则的 RHS 来替换推导中的非终结符,可以生成该语言中的不同句子。通过详尽地选择所有选择组合,可以生成整个语言。这种语言与大多数其他语言一样,是无限的,因此无法在有限的时间内生成该语言中的所有句子。
By choosing alternative RHSs of rules with which to replace nonterminals in the derivation, different sentences in the language can be generated. By exhaustively choosing all combinations of choices, the entire language can be generated. This language, like most others, is infinite, so one cannot generate all the sentences in the language in finite time.
例 3.2是典型编程语言部分语法的另一个示例。
Example 3.2 is another example of a grammar for part of a typical programming language.
<assign> → <id> =<expr>
<id> → A| B | C
<expr> → <id> +<expr>
|<id> *<expr>
| (<expr> )
|<id>
<assign> → <id> = <expr>
<id> → A| B | C
<expr> → <id> + <expr>
| <id> * <expr>
| ( <expr>)
| <id>
示例 3.2中的语法描述了赋值语句,其右侧是带有乘法和加法运算符以及括号的算术表达式。例如,语句
The grammar of Example 3.2 describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parentheses. For example, the statement
A = B * ( A + C )A = B * ( A + C )
由最左导数生成:
is generated by the leftmost derivation:
<assign> =><id> =<expr>
=> A =<expr>
=> A =<id> *<expr>
=> A = B *<expr> <expr>
=> A = B * (<expr> )
=> A = B * (<id> <expr> <expr> <id>+)
=> A = B * ( A +)
=> A = B * ( A +)
=> A = B * ( A + C )
<assign> => <id> = <expr>
=> A = <expr>
=> A = <id> * <expr>
=> A = B * <expr>
=> A = B * ( <expr>)
=> A = B * ( <id> + <expr>)
=> A = B * ( A + <expr>)
=> A = B * ( A + <id>)
=> A = B * ( A + C )
语法最吸引人的特征之一是,它们自然地描述了它们所定义的语言的句子的层次句法结构。这些层次结构称为解析树。例如,图 3.1中的解析树显示了前面推导的赋值语句的结构。
One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. These hierarchical structures are called parse trees. For example, the parse tree in Figure 3.1 shows the structure of the assignment statement derived previously.
A = B * (A + C)A = B * (A + C)解析树的每个内部节点都标有非终结符;每个叶子节点都标有终结符。解析树的每个子树都描述句子中抽象的一个实例。
Every internal node of a parse tree is labeled with a nonterminal symbol; every leaf is labeled with a terminal symbol. Every subtree of a parse tree describes one instance of an abstraction in the sentence.
如果一个文法生成的句子形式有两个或多个不同的解析树,则该文法被称为具有歧义性的。考虑示例 3.3中所示的文法,它是示例 3.2中所示文法的微小变体。
A grammar that generates a sentential form for which there are two or more distinct parse trees is said to be ambiguous. Consider the grammar shown in Example 3.3, which is a minor variation of the grammar shown in Example 3.2.
<assign> → <id> =<expr>
<id> → A | B | C
<expr> → <expr> +<expr> |<expr> *<expr> | (<expr> ) |<id>
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <id>
示例 3.3的语法具有歧义,因为句子
The grammar of Example 3.3 is ambiguous because the sentence
A = B + C * AA = B + C * A
有两棵不同的解析树,如图3.2 所示。出现歧义的原因是,该文法指定的句法结构比例 3.2 . 此文法允许表达式的解析树在左侧和右侧同时增长,而不是只在右侧增长。
has two distinct parse trees, as shown in Figure 3.2. The ambiguity occurs because the grammar specifies slightly less syntactic structure than does the grammar of Example 3.2. Rather than allowing the parse tree of an expression to grow only on the right, this grammar allows growth on both the left and the right.
A = B + C * AA = B + C * A语言结构的语法歧义是一个问题,因为编译器通常根据这些结构的语法形式来确定它们的语义。具体来说,编译器通过检查其解析树来选择要为语句生成的代码。如果语言结构有多个解析树,则无法唯一地确定结构的含义。以下小节中的两个具体示例讨论了这个问题。
Syntactic ambiguity of language structures is a problem because compilers often base the semantics of those structures on their syntactic form. Specifically, the compiler chooses the code to be generated for a statement by examining its parse tree. If a language structure has more than one parse tree, then the meaning of the structure cannot be determined uniquely. This problem is discussed in two specific examples in the following subsections.
语法还有其他几个特征有时有助于确定语法是否具有歧义性。1它们包括:(1)如果语法生成一个具有多个最左派生项的句子;(2)如果语法生成一个具有多个最右派生项的句子。
There are several other characteristics of a grammar that are sometimes useful in determining whether a grammar is ambiguous.1 They include the following: (1) if the grammar generates a sentence with more than one leftmost derivation and (2) if the grammar generates a sentence with more than one rightmost derivation.
某些解析算法可能基于歧义语法。当此类解析器遇到歧义结构时,它会使用设计者提供的非语法信息来构建正确的解析树。在许多情况下,可以将歧义语法重写为无歧义的,但仍会生成所需的语言。
Some parsing algorithms can be based on ambiguous grammars. When such a parser encounters an ambiguous construct, it uses nongrammatical information provided by the designer to construct the correct parse tree. In many cases, an ambiguous grammar can be rewritten to be unambiguous but still generate the desired language.
当表达式包含两个不同的运算符时,例如x + y * z,一个明显的语义问题是两个运算符的求值顺序(例如,在这个表达式中是先加再乘,还是反之亦然?)。这个语义问题可以通过为运算符分配不同的优先级来回答。例如,如果*被分配了比 更高的优先级+(由语言设计者指定),则无论两个运算符在表达式中的出现顺序如何,乘法都会首先进行。
When an expression includes two different operators, for example, x + y * z, one obvious semantic issue is the order of evaluation of the two operators (for example, in this expression is it add and then multiply, or vice versa?). This semantic question can be answered by assigning different precedence levels to operators. For example, if * has been assigned higher precedence than + (by the language designer), multiplication will be done first, regardless of the order of appearance of the two operators in the expression.
如前所述,语法可以描述某种句法结构,以便可以从其解析树中确定该结构的部分含义。具体而言,算术表达式中的运算符在解析树中生成的位置较低(因此必须先求值)这一事实可用于表明其优先级高于树中较高位置生成的运算符。例如,在图 3.2的第一个解析树中,乘法运算符在树中生成的位置较低,这可能表明它优先级高于表达式中的加法运算符。然而,第二个解析树却表明情况恰恰相反。因此,看起来这两个解析树表明了相互冲突的优先级信息。
As stated previously, a grammar can describe a certain syntactic structure so that part of the meaning of the structure can be determined from its parse tree. In particular, the fact that an operator in an arithmetic expression is generated lower in the parse tree (and therefore must be evaluated first) can be used to indicate that it has precedence over an operator produced higher up in the tree. In the first parse tree of Figure 3.2, for example, the multiplication operator is generated lower in the tree, which could indicate that it has precedence over the addition operator in the expression. The second parse tree, however, indicates just the opposite. It appears, therefore, that the two parse trees indicate conflicting precedence information.
请注意,虽然示例 3.2中的语法没有歧义,但其运算符的优先顺序并非通常的顺序。在此语法中,具有多个运算符的句子的解析树(无论涉及哪些特定的运算符)都具有表达式中最右边的运算符在解析树中的最低点,而树中的其他运算符在表达式中向左移动时逐渐升高。例如,在表达式中A + B * C,*是树中最低的,表示它应该首先完成。但是,在表达式中A * B + C,+是最低的,表示它应该首先完成。
Notice that although the grammar of Example 3.2 is not ambiguous, the precedence order of its operators is not the usual one. In this grammar, a parse tree of a sentence with multiple operators, regardless of the particular operators involved, has the rightmost operator in the expression at the lowest point in the parse tree, with the other operators in the tree moving progressively higher as one moves to the left in the expression. For example, in the expression A + B * C, * is the lowest in the tree, indicating it is to be done first. However, in the expression A * B + C, + is the lowest, indicating it is to be done first.
对于我们讨论过的简单表达式,可以编写一个语法,该语法既无歧义,又指定了+和*运算符的一致优先级,而不管运算符在表达式中出现的顺序如何。正确的顺序是通过使用单独的非终结符来表示具有不同优先级的运算符的操作数来指定的。这需要额外的非终结符和一些新规则。我们可以使用三个非终结符来表示操作数,而不是使用 <expr> 作为和 的两个操作数+,*这允许语法将不同的运算符强制到解析树中的不同级别。如果 <expr> 是表达式的根符号,则+可以通过让 <expr> 直接仅生成运算符来强制将其置于解析树的顶部+,使用新的非终结符 <term> 作为 的右操作数+。接下来,我们可以定义 <term> 来生成运算*符,使用 <term> 作为左操作数,使用新的非终结符 <factor> 作为其右操作数。现在,*在解析树中的位置将始终较低,因为它距离起始符号的距离比每次推导都要远。示例3.4+中的语法就是这样一种语法。
A grammar can be written for the simple expressions we have been discussing that is both unambiguous and specifies a consistent precedence of the + and * operators, regardless of the order in which the operators appear in an expression. The correct ordering is specified by using separate nonterminal symbols to represent the operands of the operators that have different precedence. This requires additional nonterminals and some new rules. Instead of using <expr> for both operands of both + and *, we could use three nonterminals to represent operands, which allows the grammar to force different operators to different levels in the parse tree. If <expr> is the root symbol for expressions, + can be forced to the top of the parse tree by having <expr> directly generate only + operators, using the new nonterminal, <term>, as the right operand of +. Next, we can define <term> to generate * operators, using <term> as the left operand and a new nonterminal, <factor>, as its right operand. Now, * will always be lower in the parse tree, simply because it is farther from the start symbol than + in every derivation. The grammar of Example 3.4 is such a grammar.
<assign> → <id> =<expr>
<id> → A | B | C
<expr> → <expr> +<term> |<term>
<term> → <term> *<factor> |<factor>
<factor> → (<expr> ) |<id>
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <term> | <term>
<term> → <term> * <factor> | <factor>
<factor> → ( <expr> ) | <id>
示例 3.4中的语法生成与示例 3.2 和 3.3 中的语法相同的语言,但它是明确的,并且指定了乘法和加法运算符的通常优先顺序。以下句子的推导使用了示例3.4A = B + C * A的语法:
The grammar in Example 3.4 generates the same language as the grammars of Examples 3.2 and 3.3, but it is unambiguous and it specifies the usual precedence order of multiplication and addition operators. The following derivation of the sentence A = B + C * A uses the grammar of Example 3.4:
<assign> =><id> =<expr>
=> A =<expr>
=> A =<expr> +<term>
=> A =<term> +<term>
=> A =<factor> +<term>
=> A =<id> +<term> <term> <term> <factor> <factor> <id>
=> A = B +<factor> <factor> <id>
=> A = B +*
=> A = B +*
=> A = B +*
=> A = B + C *
=> A = B + C *
=> A = B + C * A
<assign> => <id> = <expr>
=> A = <expr>
=> A = <expr> + <term>
=> A = <term> + <term>
=> A = <factor> + <term>
=> A = <id> + <term>
=> A = B + <term>
=> A = B + <term> * <factor>
=> A = B + <factor> * <factor>
=> A = B + <id> * <factor>
=> A = B + C * <factor>
=> A = B + C * <id>
=> A = B + C * A
使用示例 3.4的语法,该句子的唯一解析树如图 3.3所示。
The unique parse tree for this sentence, using the grammar of Example 3.4, is shown in Figure 3.3.
A = B + C * A使用无歧义文法的唯一解析树A = B + C * A using an unambiguous grammar解析树和派生之间的联系非常紧密:两者都可以很容易地从另一个构建出来。每个具有无歧义语法的派生都有一个唯一的解析树,尽管该树可以用不同的派生来表示。例如,该句子的以下派生A = B + C * A与之前给出的同一句子的派生不同。这是一个最右派生,而前一个是最左派生。然而,这两个派生都由同一棵解析树表示。
The connection between parse trees and derivations is very close: Either can easily be constructed from the other. Every derivation with an unambiguous grammar has a unique parse tree, although that tree can be represented by different derivations. For example, the following derivation of the sentence A = B + C * A is different from the derivation of the same sentence given previously. This is a rightmost derivation, whereas the previous one is leftmost. Both of these derivations, however, are represented by the same parse tree.
<assign> =><id> =<expr>
=><id> =<expr> +<term>
=><id> =<expr> +<term> *<factor>
=><id> =<expr> +<term> *<id>
=><id> =<expr> +<term> * A
=><id> =<expr> +<factor> <id> <expr> * A
=><id> <id> =<expr> +<id> <term> <id> <factor> <id> <id> <id>* A
=>=+ C * A
=>=+ C * A
=>=+ C * A
=>=+ C * A
=>= B + C * A
=> A = B + C * A
<assign> => <id> = <expr>
=> <id> = <expr> + <term>
=> <id> = <expr> + <term> * <factor>
=> <id> = <expr> + <term> * <id>
=> <id> = <expr> + <term> * A
=> <id> = <expr> + <factor> * A
=> <id> = <expr> + <id> * A
=> <id> = <expr> + C * A
=> <id> = <term> + C * A
=> <id> = <factor> + C * A
=> <id> = <id> + C * A
=> <id> = B + C * A
=> A = B + C * A
当表达式包含两个具有相同优先级的运算符(*与/通常一样)时(例如),A / B * C需要一条语义规则来指定哪个运算符应具有优先级。2此规则称为结合性。
When an expression includes two operators that have the same precedence (as * and / usually have)—for example, A / B * C—a semantic rule is required to specify which should have precedence.2 This rule is named associativity.
与优先级的情况一样,表达式的语法可能正确地暗示了运算符的结合性。考虑以下赋值语句的示例:
As was the case with precedence, a grammar for expressions may correctly imply operator associativity. Consider the following example of an assignment statement:
A = B + C + AA = B + C + A
根据示例 3.4的语法定义,该句子的解析树如图 3.4所示。
The parse tree for this sentence, as defined with the grammar of Example 3.4, is shown in Figure 3.4.
A = B + C + A用于说明加法结合律的分析树A = B + C + A illustrating the associativity of addition图 3.4的解析树显示左加法运算符低于右加法运算符。如果加法是左结合的(这是很典型的),那么这是正确的顺序。在大多数情况下,计算机中加法的结合性是无关紧要的。在数学中,加法是结合的,这意味着左结合和右结合的求值顺序是相同的。即。(A + B) + C = A + (B + C)然而,计算机中的浮点加法不一定是结合的。例如,假设浮点值存储七位精度。考虑将 11 个数字相加的问题,其中一个数字是
其余十个都是 1。如果将小数(1)逐个添加到大数中,则不会对大数产生影响,因为小数出现在大数的第八位。但是,如果先将小数相加,结果加到大数上,七位精度的结果为1.000001 *减法和除法不具有结合律,无论是在数学中还是在计算机中。因此,正确的结合律对于包含其中任何一个的表达式来说可能至关重要。
The parse tree of Figure 3.4 shows the left addition operator lower than the right addition operator. This is the correct order if addition is meant to be left associative, which is typical. In most cases, the associativity of addition in a computer is irrelevant. In mathematics, addition is associative, which means that left and right associative orders of evaluation mean the same thing. That is, (A + B) + C = A + (B + C). Floating-point addition in a computer, however, is not necessarily associative. For example, suppose floating-point values store seven digits of accuracy. Consider the problem of adding 11 numbers together, where one of the numbers is
and the other ten are 1. If the small numbers (the 1’s) are each added to the large number, one at a time, there is no effect on that number, because the small numbers occur in the eighth digit of the large number. However, if the small numbers are first added together and the result is added to the large number, the result in seven-digit accuracy is 1.000001 * Subtraction and division are not associative, whether in mathematics or in a computer. Therefore, correct associativity may be essential for an expression that contains either of them.
如果一条语法规则的 LHS 也出现在其 RHS 的开头,则称该规则为左递归。这种左递归指定了左结合性。例如,示例 3.4的语法规则的左递归使加法和乘法都具有左结合性。遗憾的是,左递归不允许使用一些重要的语法分析算法。当要使用其中一种算法时,必须修改语法以删除左递归。这反过来又不允许语法精确指定某些运算符是左结合的。幸运的是,即使语法没有规定,编译器也可以强制执行左结合性。
When a grammar rule has its LHS also appearing at the beginning of its RHS, the rule is said to be left recursive. This left recursion specifies left associativity. For example, the left recursion of the rules of the grammar of Example 3.4 causes it to make both addition and multiplication left associative. Unfortunately, left recursion disallows the use of some important syntax analysis algorithms. When one of these algorithms is to be used, the grammar must be modified to remove the left recursion. This, in turn, disallows the grammar from precisely specifying that certain operators are left associative. Fortunately, left associativity can be enforced by the compiler, even though the grammar does not dictate it.
在大多数提供该功能的语言中,幂运算符是右结合的。要表示右结合性,可以使用右递归。如果 LHS 出现在 RHS 的右端,则语法规则是右递归的。规则如下
In most languages that provide it, the exponentiation operator is right associative. To indicate right associativity, right recursion can be used. A grammar rule is right recursive if the LHS appears at the right end of the RHS. Rules such as
<因子> → <**表达式> |<因子> <
表达式> <表达式> → (<表达式) |> id
<factor> → <exp> ** <factor> |<exp>
<exp> → ( <expr> ) |id
可用于描述指数运算作为右结合运算符。
could be used to describe exponentiation as a right-associative operator.
if-elseif-elseJava 语句的 BNF 规则if-else如下:
The BNF rules for a Java if-else statement are as follows:
<if_stmt> → if(<逻辑表达式>) <stmt> (<逻辑表达式>) <stmt> <stmt> ifelse
<if_stmt> → if (<logic_expr>) <stmt> if (<logic_expr>) <stmt> else <stmt>
如果我们还有 <stmt> → <if_stmt>,那么这个语法就是模棱两可的。说明这种歧义的最简单的句子形式是
If we also have <stmt> → <if_stmt>, this grammar is ambiguous. The simplest sentential form that illustrates this ambiguity is
if(<逻辑表达式>) if(<逻辑表达式>) <语句else> <语句>
if (<logic_expr>) if (<logic_expr>) <stmt> else <stmt>
图 3.5中的两棵解析树显示了此句型的歧义性。请考虑此结构的以下示例:
The two parse trees in Figure 3.5 show the ambiguity of this sentential form. Consider the following example of this construct:
if (done == true)if (denom == 0) quotient = 0; else quotient = num / denom;
if (done == true)if (denom == 0) quotient = 0; else quotient = num / denom;
问题是,如果使用图 3.5done中上层的解析树作为翻译的基础,则当不为真时,将执行 else 子句,这可能不是构造者想要表达的意思。我们将在第8章 中研究与此 else-关联问题相关的实际问题。
The problem is that if the upper parse tree in Figure 3.5 is used as the basis for translation, the else clause would be executed when done is not true, which probably is not what was intended by the author of the construct. We will examine the practical problems associated with this else-association problem in Chapter 8.
我们现在将开发一个描述此语句的无歧义语法if。许多语言中的构造规则if是,如果存在 else 子句,则该子句与最近未匹配的 then 子句匹配。因此,在then 子句与其匹配的 之间不能没有。因此,对于这种情况,必须区分匹配的语句和不匹配的语句,其中不匹配的语句较少 ,if而所有其他语句都匹配。早期语法的问题在于,它将所有语句视为具有相同的句法意义 — 也就是说,好像它们都是匹配的。elseelseelse-if
We will now develop an unambiguous grammar that describes this if statement. The rule for if constructs in many languages is that an else clause, when present, is matched with the nearest previous unmatched then clause. Therefore, there cannot be an if statement without an else between a then clause and its matching else. So, for this situation, statements must be distinguished between those that are matched and those that are unmatched, where unmatched statements are else-less ifs and all other statements are matched. The problem with the earlier grammar is that it treats all statements as if they had equal syntactic significance—that is, as if they were all matched.
为了反映不同类别的语句,必须使用不同的抽象或非终结符。基于这些思想的无歧义语法如下:
To reflect the different categories of statements, different abstractions, or nonterminals, must be used. The unambiguous grammar based on these ideas follows:
<stmt> → <matched> |<unmatched>
<matched> → if(<logic_expr>) <matched> else<matched> |任何非 if 语句
<unmatched> → if(<logic_expr>) <stmt> (<logic_expr>) <matched> <unmatched> |ifelse
<stmt> → <matched> | <unmatched>
<matched> → if (<logic_expr>) <matched> else <matched> |any non-if statement
<unmatched> → if (<logic_expr>) <stmt> |if (<logic_expr>) <matched> else <unmatched>
对于以下句子形式,使用此语法,只有一棵可能的解析树:
There is just one possible parse tree, using this grammar, for the following sentential form:
if(<逻辑表达式>) if(<逻辑表达式>) <语句else> <语句>
if (<logic_expr>) if (<logic_expr>) <stmt> else <stmt>
由于 BNF 中存在一些小的不便之处,它已通过多种方式进行了扩展。大多数扩展版本被称为扩展 BNF,或简称为 EBNF,尽管它们并不完全相同。这些扩展不会增强 BNF 的描述能力;它们只会提高其可读性和可写性。
Because of a few minor inconveniences in BNF, it has been extended in several ways. Most extended versions are called Extended BNF, or simply EBNF, even though they are not all exactly the same. The extensions do not enhance the descriptive power of BNF; they only increase its readability and writability.
EBNF 的各个版本通常包含三个扩展。第一个扩展表示 RHS 的可选部分,由括号分隔。例如,Cif-else语句可以描述为
Three extensions are commonly included in the various versions of EBNF. The first of these denotes an optional part of an RHS, which is delimited by brackets. For example, a C if-else statement can be described as
<if_stmt> → if (<表达式> )<语句> <语句>[else]
<if_stmt> → if (<expression>) <statement> [else <statement>]
如果不使用括号,该语句的句法描述将需要以下两个规则:
Without the use of the brackets, the syntactic description of this statement would require the following two rules:
<if_stmt> → if (<表达式)> <语句> <表达式> <语句> <语句> | if ()else
<if_stmt> → if (<expression>) <statement> | if (<expression>) <statement> else <statement>
第二个扩展是在 RHS 中使用括号来表示括号内的部分可以无限重复或完全省略。此扩展允许使用单个规则构建列表,而不是使用递归和两个规则。例如,用逗号分隔的标识符列表可以用以下规则描述:
The second extension is the use of braces in a RHS to indicate that the enclosed part can be repeated indefinitely or left out altogether. This extension allows lists to be built with a single rule, instead of using recursion and two rules. For example, lists of identifiers separated by commas can be described by the following rule:
<ident_list> → <标识符> {,<标识符>}
<ident_list> → <identifier> {, <identifier>}
这是用一种隐式迭代形式代替递归;括号内的部分可以进行任意次数的迭代。
This is a replacement of the recursion by a form of implied iteration; the part enclosed within braces can be iterated any number of times.
第三个常见扩展处理多项选择题。当必须从一组元素中选择一个元素时,选项放在括号中,并用 OR 运算符分隔,|.例如,
The third common extension deals with multiple-choice options. When a single element must be chosen from a group, the options are placed in parentheses and separated by the OR operator, |. For example,
<术语> → <术语> <(* | / | %)因素>
<term> → <term> (* | / | %) <factor>
在 BNF 中,这个 <term> 的描述需要以下三个规则:
In BNF, a description of this <term> would require the following three rules:
<术语> → <*术语> |</因素> <术语> <因素 |> <术语> <%因素>
<term> → <term> * <factor> | <term> / <factor> | <term> % <factor>
EBNF 扩展中的括号、花括号和圆括号是元符号,这意味着它们是符号工具,而不是它们描述的语法实体中的终结符。如果这些元符号在所描述的语言中也是终结符,则可以为终结符的实例加上下划线或引号。示例 3.5说明了 EBNF 语法中花括号和多项选择的用法。
The brackets, braces, and parentheses in the EBNF extensions are metasymbols, which means they are notational tools and not terminal symbols in the syntactic entities they help describe. In cases where these metasymbols are also terminal symbols in the language being described, the instances that are terminal symbols can be underlined or quoted. Example 3.5 illustrates the use of braces and multiple choices in an EBNF grammar.
BNF 范本:
BNF:
<expr> → <expr> +<term> |<expr> -<term> |<term>
<term> → <term> *<factor> |<term> /<factor> |<factor>
<factor> → <exp> **<factor>
<exp>
<exp> → (<expr> ) |id
<expr> → <expr> + <term> | <expr> - <term> | <term>
<term> → <term> * <factor> | <term> / <factor> | <factor>
<factor> → <exp> ** <factor>
<exp>
<exp> → (<expr>) | id
EBNF:
EBNF:
<expr> → <term> { (+ | -)<term>}
<term> → <factor> { (* | /)<factor>}
<factor> → <exp> { **<exp>}
<exp> → (<expr> ) |id
<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}
<factor> → <exp> { ** <exp>}
<exp> → (<expr>) | id
BNF 规则
The BNF rule
<表达式> → <表达式> <+术语>
<expr> → <expr> + <term>
明确指定(实际上是强制)+运算符必须为左结合的。但是,EBNF 版本
clearly specifies—in fact forces—the + operator to be left associative. However, the EBNF version,
<表达式> → <术语{+> <术语>}
<expr> → <term> {+ <term>}
不暗示结合性的方向。在基于表达式的 EBNF 语法的语法分析器中,通过设计语法分析过程来强制执行正确的结合性,可以克服这个问题。这将在第4章 中进一步讨论。
does not imply the direction of associativity. This problem is overcome in a syntax analyzer based on an EBNF grammar for expressions by designing the syntax analysis process to enforce the correct associativity. This is discussed further in Chapter 4.
EBNF 的某些版本允许在右括号后附加数字上标,以指示括号内部分可重复的次数上限。此外,某些版本使用加号 ( +) 上标来指示一次或多次重复。例如,
Some versions of EBNF allow a numeric superscript to be attached to the right brace to indicate an upper limit to the number of times the enclosed part can be repeated. Also, some versions use a plus (+) superscript to indicate one or more repetitions. For example,
<复合> → begin<{语句> <语句>} end
<compound> → begin <stmt> {<stmt>} end
和
and
<复合> → begin {<语句}> + end
<compound> → begin {<stmt>}+ end
是等效的。
are equivalent.
近年来,BNF 和 EBNF 出现了一些变体。其中包括:
In recent years, some variations on BNF and EBNF have appeared. Among these are the following:
使用冒号代替箭头,并将 RHS 放在下一行。
In place of the arrow, a colon is used and the RHS is placed on the next line.
不使用垂直线来分隔可选的 RHS,而是简单地将它们放在不同的行上。
Instead of a vertical bar to separate alternative RHSs, they are simply placed on separate lines.
用下标 opt 代替方括号来表示某些内容是可选的。例如,
In place of square brackets to indicate something being optional, the subscript opt is used. For example,
不使用|括号中的元素列表中的符号来表示选择,而是使用“其中之一”一词。例如,
赋值运算符 → 其中之一= *= /= %= += -=
<<= >>= &= ^= |=
Rather than using the | symbol in a parenthesized list of elements to indicate a choice, the words “one of” are used. For example,
AssignmentOperator → one of = *= /= %= += -=
<<= >>= &= ^= |=
EBNF 有一个标准,即 ISO/IEC 14977:1996 (1996),但很少使用。该标准=在规则中使用等号 ( ) 代替箭头,用分号终止每个 RHS,并要求所有终止符号都使用引号。它还指定了许多其他符号规则。
There is a standard for EBNF, ISO/IEC 14977:1996 (1996), but it is rarely used. The standard uses the equal sign (=) instead of an arrow in rules, terminates each RHS with a semicolon, and requires quotes on all terminal symbols. It also specifies a host of other notational rules.
在本章前面,我们提出,对于给定的语言,生成设备和识别设备之间存在密切的关系。事实上,给定一个上下文无关语法,由该语法生成的语言的识别器可以通过算法构建。已经开发了许多执行此构建的软件系统。此类系统允许快速创建新语言的编译器的语法分析部分,因此非常有价值。这些语法分析器生成器中的第一个名为 yacc(y et a notor c ompiler c ompiler)(Johnson,1975 年)。现在有许多这样的系统可用。
Earlier in this chapter, we suggested that there is a close relationship between generation and recognition devices for a given language. In fact, given a context-free grammar, a recognizer for the language generated by the grammar can be algorithmically constructed. A number of software systems have been developed that perform this construction. Such systems allow the quick creation of the syntax analysis part of a compiler for a new language and are therefore quite valuable. One of the first of these syntax analyzer generators is named yacc (yet another compiler compiler) (Johnson, 1975). There are now many such systems available.
属性语法是一种用来描述编程语言结构的手段,比上下文无关语法所能描述的更多。属性语法是对上下文无关语法的扩展。该扩展允许方便地描述某些语言规则,例如类型兼容性。在正式定义属性语法的形式之前,我们必须明确静态语义的概念。
An attribute grammar is a device used to describe more of the structure of a programming language than can be described with a context-free grammar. An attribute grammar is an extension to a context-free grammar. The extension allows certain language rules to be conveniently described, such as type compatibility. Before we formally define the form of attribute grammars, we must clarify the concept of static semantics.
属性语法有多种用途。它们被用来提供编程语言语法和静态语义的完整描述(Watt,1979);它们被用作可以输入到编译器生成系统的语言的形式化定义(Farrow,1982);它们还被用作多种语法制导编辑系统的基础(Teitelbaum and Reps,1981;Fischer 等,1984)。此外,属性语法还用于自然语言处理系统(Correa,1992)。
Attribute grammars have been used in a wide variety of applications. They have been used to provide complete descriptions of the syntax and static semantics of programming languages (Watt, 1979); they have been used as the formal definition of a language that can be input to a compiler generation system (Farrow, 1982); and they have been used as the basis of several syntax-directed editing systems (Teitelbaum and Reps, 1981; Fischer et al., 1984). In addition, attribute grammars have been used in natural-language processing systems (Correa, 1992).
编程语言的某些特性很难用 BNF 描述,有些特性则无法用 BNF 描述。类型兼容性规则就是难以用 BNF 指定的语法规则的一个例子。例如,在 Java 中,浮点值不能分配给整数类型变量,但反之则合法。虽然此限制可以在 BNF 中指定,但它需要额外的非终结符和规则。如果 Java 的所有类型规则都以 BNF 指定,则语法将变得太大而无用,因为语法的大小决定了语法分析器的大小。
There are some characteristics of programming languages that are difficult to describe with BNF, and some that are impossible. As an example of a syntax rule that is difficult to specify with BNF, consider type compatibility rules. In Java, for example, a floating-point value cannot be assigned to an integer type variable, although the opposite is legal. Although this restriction can be specified in BNF, it requires additional nonterminal symbols and rules. If all of the typing rules of Java were specified in BNF, the grammar would become too large to be useful, because the size of the grammar determines the size of the syntax analyzer.
作为 BNF 中无法指定的语法规则的示例,请考虑所有变量在引用之前必须声明的通用规则。已证明此规则无法在 BNF 中指定。
As an example of a syntax rule that cannot be specified in BNF, consider the common rule that all variables must be declared before they are referenced. It has been proven that this rule cannot be specified in BNF.
这些问题体现了称为静态语义规则的语言规则类别。语言的静态语义仅与程序在执行期间的含义间接相关;相反,它与程序的合法形式(语法而非语义)有关。语言的许多静态语义规则规定了其类型约束。静态语义之所以如此命名,是因为检查这些规范所需的分析可以在编译时完成。
These problems exemplify the categories of language rules called static semantics rules. The static semantics of a language is only indirectly related to the meaning of programs during execution; rather, it has to do with the legal forms of programs (syntax rather than semantics). Many static semantic rules of a language state its type constraints. Static semantics is so named because the analysis required to check these specifications can be done at compile time.
由于使用 BNF 描述静态语义存在问题,因此人们设计了各种更强大的机制来完成这项任务。Knuth (1968)设计了一种这样的机制,即属性语法,用于描述程序的语法和静态语义。
Because of the problems of describing static semantics with BNF, a variety of more powerful mechanisms has been devised for that task. One such mechanism, attribute grammars, was designed by Knuth (1968) to describe both the syntax and the static semantics of programs.
属性文法是一种形式化方法,用于描述和检查程序静态语义规则的正确性。尽管在编译器设计中它们并不总是以形式化的方式使用,但属性文法的基本概念至少在每个编译器中都非正式地使用(参见Aho 等人,1988 年)。
Attribute grammars are a formal approach both to describing and checking the correctness of the static semantics rules of a program. Although they are not always used in a formal way in compiler design, the basic concepts of attribute grammars are at least informally used in every compiler (see Aho et al., 1988).
第3.5节 讨论了动态语义,即表达式、语句和程序单元的含义。
Dynamic semantics, which is the meaning of expressions, statements, and program units, is discussed in Section 3.5.
属性语法是添加了属性、属性计算函数和谓词函数的上下文无关语法。属性与语法符号(终结符和非终结符)相关联,类似于变量,因为它们可以赋值。属性计算函数有时称为语义函数,与语法规则相关联。它们用于指定如何计算属性值。谓词函数陈述语言的静态语义规则,与语法规则相关联。
Attribute grammars are context-free grammars to which have been added attributes, attribute computation functions, and predicate functions. Attributes, which are associated with grammar symbols (the terminal and nonterminal symbols), are similar to variables in the sense that they can have values assigned to them. Attribute computation functions, sometimes called semantic functions, are associated with grammar rules. They are used to specify how attribute values are computed. Predicate functions, which state the static semantic rules of the language, are associated with grammar rules.
在我们正式定义属性语法并提供示例之后,这些概念将变得更加清晰。
These concepts will become clearer after we formally define attribute grammars and provide an example.
属性语法是具有以下附加特征的语法:
An attribute grammar is a grammar with the following additional features:
每个语法符号 X 都与一组属性 A(X) 相关联。集合 A(X) 由两个不相交的集合 S(X) 和 I(X) 组成,分别称为合成属性和继承属性。合成属性用于将语义信息传递到解析树上,而继承属性则将语义信息传递到解析树中。
Associated with each grammar symbol X is a set of attributes A(X). The set A(X) consists of two disjoint sets S(X) and I(X), called synthesized and inherited attributes, respectively. Synthesized attributes are used to pass semantic information up a parse tree, while inherited attributes pass semantic information down and across a tree.
与每条语法规则相关联的是一组语义函数和一组可能为空的谓词函数,这些谓词函数针对语法规则中的符号的属性。对于规则 的综合属性 使用以下形式的语义函数计算 因此,解析树节点上的合成属性的值仅取决于该节点的子节点上的属性值。符号的继承属性 (在上述规则中),使用以下形式的语义函数计算 因此,解析树节点上继承属性的值取决于该节点的父节点及其兄弟节点的属性值。请注意,为了避免循环,继承属性通常限制为以下形式的函数这种形式可以防止继承的属性依赖于其自身或解析树中右侧的属性。
Associated with each grammar rule is a set of semantic functions and a possibly empty set of predicate functions over the attributes of the symbols in the grammar rule. For a rule the synthesized attributes of are computed with semantic functions of the form So the value of a synthesized attribute on a parse tree node depends only on the values of the attributes on that node’s children nodes. Inherited attributes of symbols (in the rule above), are computed with a semantic function of the form So the value of an inherited attribute on a parse tree node depends on the attribute values of that node’s parent node and those of its sibling nodes. Note that, to avoid circularity, inherited attributes are often restricted to functions of the form This form prevents an inherited attribute from depending on itself or on attributes to the right in the parse tree.
谓词函数的形式为属性集并集的布尔表达式 以及一组文字属性值。属性语法允许的唯一派生是与每个非终结符相关的每个谓词都为真的派生。错误的谓词函数值表示违反了语言的语法或静态语义规则。
A predicate function has the form of a Boolean expression on the union of the attribute set and a set of literal attribute values. The only derivations allowed with an attribute grammar are those in which every predicate associated with every nonterminal is true. A false predicate function value indicates a violation of the syntax or static semantics rules of the language.
属性语法的解析树是基于其底层 BNF 语法的解析树,每个节点都附加一组可能为空的属性值。如果解析树中的所有属性值都已计算,则称该树已完全归因。虽然在实践中并不总是这样做,但将属性值视为在编译器构建完整的未归因解析树后计算是方便的。
A parse tree of an attribute grammar is the parse tree based on its underlying BNF grammar, with a possibly empty set of attribute values attached to each node. If all the attribute values in a parse tree have been computed, the tree is said to be fully attributed. Although in practice it is not always done this way, it is convenient to think of attribute values as being computed after the complete unattributed parse tree has been constructed by the compiler.
内在属性是叶节点的合成属性,其值在解析树之外确定。例如,程序中变量实例的类型可能来自符号表,该表用于存储变量名称及其类型。符号表的内容是根据先前的声明语句设置的。最初,假设已构建了未归因的解析树并且需要属性值,则唯一具有值的属性是叶节点的内在属性。给定解析树上的内在属性值,可以使用语义函数来计算其余的属性值。
Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined outside the parse tree. For example, the type of an instance of a variable in a program could come from the symbol table, which is used to store variable names and their types. The contents of the symbol table are set based on earlier declaration statements. Initially, assuming that an unattributed parse tree has been constructed and that attribute values are needed, the only attributes with values are the intrinsic attributes of the leaf nodes. Given the intrinsic attribute values on a parse tree, the semantic functions can be used to compute the remaining attribute values.
作为如何使用属性语法描述静态语义的一个非常简单的例子,请考虑以下属性语法片段,该片段描述了endAda 过程的名称必须与过程名称匹配的规则。(此规则不能用 BNF 来陈述。)<proc_name> 的字符串属性,表示为 <proc_name>.string,是编译器在保留字后找到的实际字符串procedure。请注意,当属性语法中的语法规则中出现多个非终结符时,非终结符将用括号下标以区分它们。下标和括号都不是所描述语言的一部分。
As a very simple example of how attribute grammars can be used to describe static semantics, consider the following fragment of an attribute grammar that describes the rule that the name on the end of an Ada procedure must match the procedure’s name. (This rule cannot be stated in BNF.) The string attribute of <proc_name>, denoted by <proc_name>.string, is the actual string of characters that were found immediately following the reserved word procedure by the compiler. Notice that when there is more than one occurrence of a nonterminal in a syntax rule in an attribute grammar, the nonterminals are subscripted with brackets to distinguish them. Neither the subscripts nor the brackets are part of the described language.
语法规则:<proc_def>→ procedure<proc_name> [1]
Syntax rule: <proc_def> → procedure <proc_name>[1]
<过程主体> end<过程名称> [2];
<proc_body> end <proc_name>[2];
谓词: <proc_name> [1]字符串==<proc_name> [2 ].string
Predicate: <proc_name>[1]string == <proc_name>[2].string
在此示例中,谓词规则规定子程序头中的 <proc_name> 非终结符的名称字符串属性必须与子程序结尾处的 <proc_name> 非终结符的名称字符串属性匹配。
In this example, the predicate rule states that the name string attribute of the <proc_name> nonterminal in the subprogram header must match the name string attribute of the <proc_name> nonterminal following the end of the subprogram.
接下来,我们考虑一个更大的属性语法示例。在本例中,该示例说明了如何使用属性语法来检查简单赋值语句的类型规则。此赋值语句的语法和静态语义如下:唯一的变量名是A、B和C。赋值的右侧可以是变量,也可以是变量添加到另一个变量形式的表达式。变量可以是以下两种类型之一:int 或 real。当赋值语句的右侧有两个变量时,它们不必是同一类型。当操作数类型不相同时,表达式类型始终为实数。当它们相同时,表达式类型就是操作数的类型。赋值语句左侧的类型必须与右侧的类型匹配。因此,右侧的操作数类型可以混合使用,但只有当目标和右侧求值所得的值具有相同类型时,赋值才有效。属性语法指定了这些静态语义规则。
Next, we consider a larger example of an attribute grammar. In this case, the example illustrates how an attribute grammar can be used to check the type rules of a simple assignment statement. The syntax and static semantics of this assignment statement are as follows: The only variable names are A, B, and C. The right side of the assignments can be either a variable or an expression in the form of a variable added to another variable. The variables can be one of two types: int or real. When there are two variables on the right side of an assignment, they need not be the same type. The type of the expression when the operand types are not the same is always real. When they are the same, the expression type is that of the operands. The type of the left side of the assignment must match the type of the right side. So the types of operands in the right side can be mixed, but the assignment is valid only if the target and the value resulting from evaluating the right side have the same type. The attribute grammar specifies these static semantic rules.
我们的示例属性语法的语法部分是
The syntax portion of our example attribute grammar is
<分配> → <变量=> <
表达式> <表达式> → <变量> <变量> <变量> <+变量 |>
→A | B | C
<assign> → <var> = <expr>
<expr> → <var> + <var> | <var>
<var> → A | B | C
示例属性语法中的非终结符的属性在以下段落中描述:
The attributes for the nonterminals in the example attribute grammar are described in the following paragraphs:
actual_type — 与非终结符 <var> 和 <expr> 关联的合成属性。它用于存储变量或表达式的实际类型(int 或 real)。对于变量,实际类型是固有的。对于表达式,它由 <expr> 非终结符的子节点或子节点的实际类型确定。
actual_type—A synthesized attribute associated with the nonterminals <var> and <expr>. It is used to store the actual type, int or real, of a variable or expression. In the case of a variable, the actual type is intrinsic. In the case of an expression, it is determined from the actual types of the child node or children nodes of the <expr> nonterminal.
expected_type — 与非终结符 <expr> 关联的继承属性。它用于存储表达式的预期类型(int 或 real),由赋值语句左侧的变量类型决定。
完整的属性语法如下例 3.6所示。
expected_type—An inherited attribute associated with the nonterminal <expr>. It is used to store the type, either int or real, that is expected for the expression, as determined by the type of the variable on the left side of the assignment statement.
The complete attribute grammar follows in Example 3.6.
语法规则:<assign> → <var> =<expr>
语义规则:<expr>.expected_type ← <var>.actual_type
Syntax rule: <assign> → <var> = <expr>
Semantic rule: <expr>.expected_type ← <var>.actual_type
语法规则: <expr> → <var>[2] +<var>[3]
语义规则:<expr>.actual_type ←
如果(<var>[2].actual_type =int)和
(<var>[3].实际类型=int)
然后 int
否则真实
如果结束
谓词:<expr>.actual_type ==<expr>.expected_type
Syntax rule: <expr> → <var>[2] + <var>[3]
Semantic rule: <expr>.actual_type ←
if (<var>[2].actual_type = int) and
(<var>[3].actual_type = int)
then int
else real
end if
Predicate: <expr>.actual_type == <expr>.expected_type
语法规则:<expr> → <var>
语义规则:<expr>.actual_type ← <var>.actual_type
谓词:<expr>.actual_type ==<expr>.expected_type
Syntax rule: <expr> → <var>
Semantic rule: <expr>.actual_type ← <var>.actual_type
Predicate: <expr>.actual_type == <expr>.expected_type
语法规则:<var> →A | B | C
语义规则:<var>.actual_type ← 查找(<var>.string)
Syntax rule: <var> → A | B | C
Semantic rule: <var>.actual_type ← look-up(<var>.string)
查找函数在符号表中查找给定的变量名并返回该变量的类型。
The look-up function looks up a given variable name in the symbol table and returns the variable’s type.
图3.6显示了示例3.6A = A + B中的文法生成的句子的解析树。与文法一样,树中重复的节点标签后添加了括号中的数字,以便可以无歧义地引用它们。
A parse tree of the sentence A = A + B generated by the grammar in Example 3.6 is shown in Figure 3.6. As in the grammar, bracketed numbers are added after the repeated node labels in the tree so they can be referenced unambiguously.
A = A + BA = A + B现在,考虑计算解析树的属性值的过程,有时称为修饰解析树。如果所有属性都是继承的,则计算过程可以完全自上而下地进行,从根到叶。或者,如果所有属性都是合成的,则计算过程可以完全自下而上地进行,从叶到根。因为我们的语法既有合成属性,也有继承属性,所以计算过程不能朝任何一个方向进行。以下是按可以计算的顺序对属性进行的计算:
Now, consider the process of computing the attribute values of a parse tree, which is sometimes called decorating the parse tree. If all attributes were inherited, this could proceed in a completely top-down order, from the root to the leaves. Alternatively, it could proceed in a completely bottom-up order, from the leaves to the root, if all the attributes were synthesized. Because our grammar has both synthesized and inherited attributes, the evaluation process cannot be in any single direction. The following is an evaluation of the attributes, in an order in which it is possible to compute them:
<var>.actual_type ← look-up( A) (规则 4)
<var>.actual_type ← look-up(A) (Rule 4)
<expr>.expected_type ← <var>.actual_type(规则 1)
<expr>.expected_type ← <var>.actual_type (Rule 1)
<var>[2].actual_type ← look-up( A) (规则 4)
<var>[3].actual_type ← look-up( B) (规则 4)
<var>[2].actual_type ← look-up(A) (Rule 4)
<var>[3].actual_type ← look-up(B) (Rule 4)
<expr>.actual_type ← 要么是 int,要么是 real (规则 2)
<expr>.actual_type ← either int or real (Rule 2)
<expr>.expected_type ==<expr>.actual_type 可以是
正确或错误 (规则 2)
<expr>.expected_type == <expr>.actual_type is either
TRUE or FALSE (Rule 2)
图 3.7中的树显示了图 3.6示例中属性值的流动。实线表示解析树;虚线表示树中的属性流。
The tree in Figure 3.7 shows the flow of attribute values in the example of Figure 3.6. Solid lines show the parse tree; dashed lines show attribute flow in the tree.
图 3.8中的树显示了节点上的最终属性值。在此示例中,A定义为实数,B定义为整数。
The tree in Figure 3.8 shows the final attribute values on the nodes. In this example, A is defined as a real and B is defined as an int.
确定属性语法一般情况的属性评估顺序是一个复杂的问题,需要构建依赖图来显示所有属性依赖关系。
Determining attribute evaluation order for the general case of an attribute grammar is a complex problem, requiring the construction of a dependency graph to show all attribute dependencies.
检查语言的静态语义规则是所有编译器的必要部分。即使编译器编写者从未听说过属性文法,他或她也需要使用属性文法的基本思想来为其编译器设计静态语义规则的检查。
Checking the static semantic rules of a language is an essential part of all compilers. Even if a compiler writer has never heard of an attribute grammar, he or she would need to use the fundamental ideas of attribute grammars to design the checks of static semantics rules for his or her compiler.
使用属性语法来描述实际当代编程语言的所有语法和静态语义的主要困难之一是属性语法的大小和复杂性。完整的编程语言所需的大量属性和语义规则使此类语法难以编写和阅读。此外,大型解析树上的属性值的评估成本很高。另一方面,不太正式的属性对于编译器编写者来说,语法是一种强大且常用的工具,他们对编译器的编写过程比形式主义更感兴趣。
One of the main difficulties in using an attribute grammar to describe all of the syntax and static semantics of a real contemporary programming language is the size and complexity of the attribute grammar. The large number of attributes and semantic rules required for a complete programming language make such grammars difficult to write and read. Furthermore, the attribute values on a large parse tree are costly to evaluate. On the other hand, less formal attribute grammars are a powerful and commonly used tool for compiler writers, who are more interested in the process of producing a compiler than they are in formalism.
现在我们来谈谈描述编程语言的表达式、语句和程序单元的动态语义或含义这一困难的任务。由于可用符号的强大和自然性,描述语法是一个相对简单的事情。另一方面,尚未设计出用于动态语义的普遍接受的符号或方法。在本节中,我们简要介绍已开发的几种方法。在本节的其余部分,当我们使用术语语义时,我们指的是动态语义。
We now turn to the difficult task of describing the dynamic semantics, or meaning, of the expressions, statements, and program units of a programming language. Because of the power and naturalness of the available notation, describing syntax is a relatively simple matter. On the other hand, no universally accepted notation or approach has been devised for dynamic semantics. In this section, we briefly describe several of the methods that have been developed. For the remainder of this section, when we use the term semantics, we mean dynamic semantics.
需要一种方法和符号来描述语义,其背后有几个不同的原因。程序员显然需要准确地知道语言语句的作用,然后才能在程序中有效地使用它们。编译器编写者必须准确地知道语言结构的含义,才能正确地为它们设计实现。如果编程语言有精确的语义规范,那么用该语言编写的程序可能无需测试就能被证明是正确的。此外,可以证明编译器生成的程序完全表现出语言定义中给出的行为;也就是说,它们的正确性可以得到验证。编程语言语法和语义的完整规范可以被工具用来自动生成该语言的编译器。最后,语言设计者在开发其语言的语义描述时,可以在这个过程中发现其设计中的歧义和不一致之处。
There are several different reasons underlying the need for a methodology and notation for describing semantics. Programmers obviously need to know precisely what the statements of a language do before they can use them effectively in their programs. Compiler writers must know exactly what language constructs mean to design implementations for them correctly. If there were a precise semantics specification of a programming language, programs written in the language potentially could be proven correct without testing. Also, compilers could be shown to produce programs that exhibited exactly the behavior given in the language definition; that is, their correctness could be verified. A complete specification of the syntax and semantics of a programming language could be used by a tool to generate a compiler for the language automatically. Finally, language designers, who would develop the semantic descriptions of their languages, could in the process discover ambiguities and inconsistencies in their designs.
软件开发人员和编译器设计人员通常通过阅读语言手册中的英文解释来确定编程语言的语义。由于此类解释通常不精确且不完整,因此这种方法显然不能令人满意。由于编程语言缺乏完整的语义规范,程序很少能在未经测试的情况下证明其正确性,并且商业编译器也从未根据语言描述自动生成。
Software developers and compiler designers typically determine the semantics of programming languages by reading English explanations in language manuals. Because such explanations are often imprecise and incomplete, this approach is clearly unsatisfactory. Due to the lack of complete semantics specifications of programming languages, programs are rarely proven correct without testing, and commercial compilers are never generated automatically from language descriptions.
Scheme 是一种函数式语言,在第15章 中进行了描述,它是少数几种定义中包含形式化语义描述的编程语言之一。但是,所用的方法不是本章中描述的方法,因为本章重点介绍适用于命令式语言的方法。
Scheme, a functional language described in Chapter 15, is one of only a few programming languages whose definition includes a formal semantics description. However, the method used is not one described in this chapter, as this chapter is focused on approaches that are suitable for imperative languages.
操作语义背后的思想是通过指定在机器上运行语句或程序的效果来描述其含义。对机器的影响被视为其状态变化的序列,其中机器的状态是其存储中的值的集合。然后,通过在计算机上执行程序的编译版本可以给出明显的操作语义描述。大多数程序员至少有一次编写了一个小测试程序来确定某些编程语言构造的含义,通常是在学习语言时。本质上,这样的程序员正在做的是使用操作语义来确定构造的含义。
The idea behind operational semantics is to describe the meaning of a statement or program by specifying the effects of running it on a machine. The effects on the machine are viewed as the sequence of changes in its state, where the machine’s state is the collection of the values in its storage. An obvious operational semantics description, then, is given by executing a compiled version of the program on a computer. Most programmers have, on at least one occasion, written a small test program to determine the meaning of some programming language construct, often while learning the language. Essentially, what such a programmer is doing is using operational semantics to determine the meaning of the construct.
使用这种方法进行完整的形式化语义描述存在几个问题。首先,机器语言执行中的各个步骤以及由此导致的机器状态变化太少且太多。其次,真实计算机的存储太大且太复杂。通常有几层存储设备,以及通过网络与无数其他计算机和存储设备的连接。因此,机器语言和真实计算机不用于形式化操作语义。相反,理想计算机的中级语言和解释器是专门为该过程设计的。
There are several problems with using this approach for complete formal semantics descriptions. First, the individual steps in the execution of machine language and the resulting changes to the state of the machine are too small and too numerous. Second, the storage of a real computer is too large and complex. There are usually several levels of memory devices, as well as connections to enumerable other computers and memory devices through networks. Therefore, machine languages and real computers are not used for formal operational semantics. Rather, intermediate-level languages and interpreters for idealized computers are designed specifically for the process.
操作语义有不同层次的用途。在最高层次上,人们感兴趣的是完整程序执行的最终结果。有时这被称为自然操作语义。在最低层次上,操作语义可用于通过检查程序执行时发生的状态变化的完整序列来确定程序的精确含义。这种用途有时被称为结构操作语义。
There are different levels of uses of operational semantics. At the highest level, the interest is in the final result of the execution of a complete program. This is sometimes called natural operational semantics. At the lowest level, operational semantics can be used to determine the precise meaning of a program through an examination of the complete sequence of state changes that occur when the program is executed. This use is sometimes called structural operational semantics.
创建语言的操作语义描述的第一步是设计适当的中间语言,这种语言的主要期望特征是清晰度。中间语言的每个构造都必须具有明显且无歧义的含义。这种语言处于中间级别,因为机器语言太低级而不易理解,而另一种高级语言显然不适合。如果要将语义描述用于自然操作语义,则必须为中间语言构建虚拟机(解释器)。虚拟机可用于执行单个语句、代码段或整个程序。如果只需要单个语句的含义,则可以不使用虚拟机来使用语义描述。在这种结构化操作语义的用法中,可以直观地检查中间代码。
The first step in creating an operational semantics description of a language is to design an appropriate intermediate language, where the primary desired characteristic of the language is clarity. Every construct of the intermediate language must have an obvious and unambiguous meaning. This language is at the intermediate level, because machine language is too low-level to be easily understood and another high-level language is obviously not suitable. If the semantics description is to be used for natural operational semantics, a virtual machine (an interpreter) must be constructed for the intermediate language. The virtual machine can be used to execute either single statements, code segments, or whole programs. The semantics description can be used without a virtual machine if the meaning of a single statement is all that is required. In this use, which is structural operational semantics, the intermediate code can be visually inspected.
操作语义的基本过程并不罕见。事实上,这个概念经常出现在编程教科书和编程语言参考手册中。例如,Cfor构造的语义可以用更简单的语句来描述,如
The basic process of operational semantics is not unusual. In fact, the concept is frequently used in programming textbooks and programming language reference manuals. For example, the semantics of the C for construct can be described in terms of simpler statements, as in
这种描述的人类读者是虚拟计算机,并且被假定能够正确“执行”定义中的指令并识别“执行”的效果。
The human reader of such a description is the virtual computer and is assumed to be able to “execute” the instructions in the definition correctly and recognize the effects of the “execution.”
用于正式操作语义描述的中间语言及其相关虚拟机通常非常抽象。中间语言旨在方便虚拟机,而不是方便人类读者。然而,就我们的目的而言,可以使用更面向人类的中间语言。作为这样的示例,请考虑以下语句列表,这些语句足以描述典型编程语言的简单控制语句的语义:
The intermediate language and its associated virtual machine used for formal operational semantics descriptions are often highly abstract. The intermediate language is meant to be convenient for the virtual machine, rather than for human readers. For our purposes, however, a more human-oriented intermediate language could be used. As such an example, consider the following list of statements, which would be adequate for describing the semantics of the simple control statements of a typical programming language:
身份=变量
ident = var
识别=识别+1
ident = ident + 1
识别=识别–1
ident = ident – 1
goto标签
goto label
ifvar relop vargoto标签
if var relop var goto label
在这些语句中,relop 是集合中的关系运算符之一{=, <>, >, <, <=, <=},ident 是标识符,var 是标识符或常量。这些语句都很简单,因此易于理解和实现。
In these statements, relop is one of the relational operators from the set {=, <>, >, <, <=, <=}, ident is an identifier, and var is either an identifier or a constant. These statements are all simple and therefore easy to understand and implement.
对这三个赋值语句进行稍微概括,可以描述更一般的算术表达式和赋值语句。新的语句是
A slight generalization of these three assignment statements allows more general arithmetic expressions and assignment statements to be described. The new statements are
ident =var bin_op var
ident =un_op var
ident = var bin_op var
ident = un_op var
其中 bin_op 是二元算术运算符,un_op 是一元运算符。当然,多种算术数据类型和自动类型转换使这种概括变得复杂。只需添加一些相对简单的指令,就可以描述数组、记录、指针和子程序的语义。
where bin_op is a binary arithmetic operator and un_op is a unary operator. Multiple arithmetic data types and automatic type conversions, of course, complicate this generalization. Adding just a few more relatively simple instructions would allow the semantics of arrays, records, pointers, and subprograms to be described.
In Chapter 8, the semantics of various control statements are described using this intermediate language.
形式操作语义的第一个也是最重要的用途是描述 PL/I 的语义(Wegner,1972 年)。该特定抽象机和 PL/I 的翻译规则一起被命名为维也纳定义语言 (VDL),以 IBM 设计它的城市命名。
The first and most significant use of formal operational semantics was to describe the semantics of PL/I (Wegner, 1972). That particular abstract machine and the translation rules for PL/I were together named the Vienna Definition Language (VDL), after the city where IBM designed it.
操作语义为语言使用者和语言实现者提供了一种描述语义的有效方法,只要描述保持简单和非正式即可。遗憾的是,PL/I 的 VDL 描述过于复杂,没有实际用途。
Operational semantics provides an effective means of describing semantics for language users and language implementors, as long as the descriptions are kept simple and informal. The VDL description of PL/I, unfortunately, is so complex that it serves no practical purpose.
操作语义依赖于较低级别的编程语言,而不是数学。一种编程语言的语句是用较低级别编程语言的语句来描述的。这种方法可能会导致循环,即概念间接地根据其自身来定义。以下两节中描述的方法更为正式,因为它们基于数学和逻辑,而不是编程语言。
Operational semantics depends on programming languages of lower levels, not mathematics. The statements of one programming language are described in terms of the statements of a lower-level programming language. This approach can lead to circularities, in which concepts are indirectly defined in terms of themselves. The methods described in the following two sections are much more formal, in the sense that they are based on mathematics and logic, not programming languages.
指称语义是描述程序含义的最严格和最广为人知的形式化方法。它以递归函数理论为基础。全面讨论使用指称语义来描述编程语言的语义必然会很长而且很复杂。我们的目的是向读者介绍指称语义的核心概念,以及与编程语言规范相关的一些简单示例。
Denotational semantics is the most rigorous and most widely known formal method for describing the meaning of programs. It is solidly based on recursive function theory. A thorough discussion of the use of denotational semantics to describe the semantics of programming languages is necessarily long and complex. It is our intent to provide the reader with an introduction to the central concepts of denotational semantics, along with a few simple examples that are relevant to programming language specifications.
为编程语言构建指称语义规范的过程需要为每个语言实体定义一个数学对象和一个将该语言实体的实例映射到数学对象实例的函数。由于对象是严格定义的,因此它们可以模拟其对应实体的确切含义。这个想法基于这样一个事实:存在操纵数学对象的严格方法,但编程语言构造却没有。这种方法的难点在于创建对象和映射函数。该方法之所以被称为指称方法,是因为数学对象表示其对应句法实体的含义。
The process of constructing a denotational semantics specification for a programming language requires one to define for each language entity both a mathematical object and a function that maps instances of that language entity onto instances of the mathematical object. Because the objects are rigorously defined, they model the exact meaning of their corresponding entities. The idea is based on the fact that there are rigorous ways of manipulating mathematical objects but not programming language constructs. The difficulty with this method lies in creating the objects and the mapping functions. The method is named denotational because the mathematical objects denote the meaning of their corresponding syntactic entities.
指称语义编程语言规范的映射函数与数学中的所有函数一样,具有定义域和值域。定义域是作为函数合法参数的值的集合;值域是参数映射到的对象的集合。在指称语义中,定义域称为句法域,因为映射的是句法结构。值域称为语义域。
The mapping functions of a denotational semantics programming language specification, like all functions in mathematics, have a domain and a range. The domain is the collection of values that are legitimate parameters to the function; the range is the collection of objects to which the parameters are mapped. In denotational semantics, the domain is called the syntactic domain, because it is syntactic structures that are mapped. The range is called the semantic domain.
指称语义与操作语义相关。在操作语义中,编程语言结构被翻译成更简单的编程语言结构,这些结构成为结构含义的基础。在指称语义中,编程语言结构被映射到数学对象,可以是集合,也可以是函数(更常见的情况)。然而,与操作语义不同,指称语义并不对程序的逐步计算处理进行建模。
Denotational semantics is related to operational semantics. In operational semantics, programming language constructs are translated into simpler programming language constructs, which become the basis of the meaning of the construct. In denotational semantics, programming language constructs are mapped to mathematical objects, either sets or, more often, functions. However, unlike operational semantics, denotational semantics does not model the step-by-step computational processing of programs.
我们使用一种非常简单的语言结构,即二进制数的字符串表示,来介绍表示方法。这种二进制数的语法可以用以下语法规则来描述:
We use a very simple language construct, character string representations of binary numbers, to introduce the denotational method. The syntax of such binary numbers can be described by the following grammar rules:
<二进制编号>
'0'
| '1'
|<二进制编号> '0'
|<二进制编号>'1'
<bin_num>
'0'
| '1'
| <bin_num> '0'
| <bin_num> '1'
图3.9110显示了示例二进制数的解析树。请注意,我们在语法数字周围加上撇号,以表明它们不是数学数字。这类似于 ASCII 编码数字和数学数字之间的关系。当程序将数字读取为字符串时,必须先将其转换为数学数字,然后才能将其用作程序中的值。
A parse tree for the example binary number, 110, is shown in Figure 3.9. Notice that we put apostrophes around the syntactic digits to show they are not mathematical digits. This is similar to the relationship between ASCII coded digits and mathematical digits. When a program reads a number as a string, it must be converted to a mathematical number before it can be used as a value in the program.
二进制数映射函数的句法域是所有二进制数的字符串表示的集合,语义域是非负十进制数的集合,用N表示。
The syntactic domain of the mapping function for binary numbers is the set of all character string representations of binary numbers. The semantic domain is the set of nonnegative decimal numbers, symbolized by N.
为了使用形式语义描述二进制数的含义,我们将实际含义(十进制数)与每个以单个终端符号作为其 RHS 的规则相关联。
To describe the meaning of binary numbers using denotational semantics, we associate the actual meaning (a decimal number) with each rule that has a single terminal symbol as its RHS.
在我们的示例中,十进制数必须与前两个语法规则相关联。其他两个语法规则在某种意义上是计算规则,因为它们将一个可以与对象相关联的终端符号与一个可以预期表示某种构造的非终端符号相结合。假设在解析树中向上进行的评估,右侧的非终端符号已经具有其含义。因此,以非终端符号作为其 RHS 的语法规则需要一个函数来计算 LHS 的含义,该函数表示完整 RHS 的含义。
In our example, decimal numbers must be associated with the first two grammar rules. The other two grammar rules are, in a sense, computational rules, because they combine a terminal symbol, to which an object can be associated, with a nonterminal, which can be expected to represent some construct. Presuming an evaluation that progresses upward in the parse tree, the nonterminal in the right side would already have its meaning attached. So, a syntax rule with a nonterminal as its RHS would require a function that computed the meaning of the LHS, which represents the meaning of the complete RHS.
语义函数,名为 将前面语法规则中描述的句法对象映射到 N 中的对象(非负十进制数集)。函数 定义如下:
The semantic function, named maps the syntactic objects, as described in the previous grammar rules, to the objects in N, the set of non-negative decimal numbers. The function is defined as follows:
('0') = 0
('1') = 1
(<二进制编号>'0') = 2 *
(<二进制编号>)
(<二进制编号>'1') = 2 *
(<二进制编号>) + 1
('0') = 0
('1') = 1
(<bin_num> '0') = 2 *
(<bin_num>)
(<bin_num>'1') = 2 *
(<bin_num>) + 1
可以将含义或表示的对象(在本例中为十进制数)附加到上一页所示的解析树的节点,从而得到图 3.10中的树。这是语法制导语义。语法实体被映射到具有具体含义的数学对象。
The meanings, or denoted objects (which in this case are decimal numbers), can be attached to the nodes of the parse tree shown on the previous page, yielding the tree in Figure 3.10. This is syntax-directed semantics. Syntactic entities are mapped to mathematical objects with concrete meaning.
部分因为我们稍后需要它,我们现在展示一个类似的例子来描述句法十进制文字的含义。在这种情况下,句法域是十进制数的字符串表示的集合。语义域再次是集合 N。
In part because we need it later, we now show a similar example for describing the meaning of syntactic decimal literals. In this case, the syntactic domain is the set of character string representations of decimal numbers. The semantic domain is once again the set N.
<十进制数> → '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7''8'|'9'
|<十进制数>('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9')
<dec_num> → '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7''8'|'9'
|<dec_num> ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9')
这些语法规则的含义映射是
The denotational mappings for these syntax rules are
(‘0’)
(‘1’)
(‘2’)
. . .,
(‘9’)
(<dec_num> ‘0’)
(<十进制数>)
(<十进制数> ‘1’)
(<十进制数>)
...
(<dec_num> ‘9’)
(<十进制数>)
('0')
('1')
('2')
. . .,
('9')
(<dec_num> '0')
(<dec_num>)
(<dec_num> '1')
(<dec_num>)
. . .
(<dec_num> '9')
(<dec_num>)
在接下来的几节中,我们将介绍几个简单构造的指称语义描述。这里最重要的简化假设是构造的语法和静态语义都是正确的。此外,我们假设仅包含两种标量类型:整数和布尔值。
In the following sections, we present the denotational semantics descriptions of a few simple constructs. The most important simplifying assumption made here is that both the syntax and static semantics of the constructs are correct. In addition, we assume that only two scalar types are included: integer and Boolean.
程序的指称语义可以根据理想计算机中的状态变化来定义。操作语义以这种方式定义,指称语义以几乎相同的方式定义。然而,进一步简化,指称语义仅根据程序所有变量的值来定义。因此,指称语义使用程序的状态来描述含义,而操作语义使用机器的状态。操作语义和指称语义之间的主要区别在于,操作语义中的状态变化由用某种编程语言编写的编码算法定义,而在指称语义中,状态变化由数学函数定义。
The denotational semantics of a program could be defined in terms of state changes in an ideal computer. Operational semantics are defined in this way, and denotational semantics are defined in nearly the same way. In a further simplification, however, denotational semantics is defined in terms of only the values of all of the program’s variables. So, denotational semantics uses the state of the program to describe meaning, whereas operational semantics uses the state of a machine. The key difference between operational semantics and denotational semantics is that state changes in operational semantics are defined by coded algorithms, written in some programming language, whereas in denotational semantics, state changes are defined by mathematical functions.
让程序的状态 s 表示为一组有序对,如下所示:
Let the state s of a program be represented as a set of ordered pairs, as follows:
s= {
}
s = {
}
每个 i 都是一个变量的名称,关联的 v 是这些变量的当前值。任何 v 都可以具有特殊值undef,这表示其关联变量当前未定义。让 VARMAP 成为两个参数的函数:变量名称和程序状态。VARMAP 的值( s) 是 (与处于状态 s 中)。大多数程序和程序构造的语义映射函数将状态映射到状态。这些状态变化用于定义程序和程序构造的含义。一些语言构造(例如表达式)被映射到值,而不是状态。
Each i is the name of a variable, and the associated v’s are the current values of those variables. Any of the v’s can have the special value undef, which indicates that its associated variable is currently undefined. Let VARMAP be a function of two parameters: a variable name and the program state. The value of VARMAP ( s) is (the value paired with in state s). Most semantics mapping functions for programs and program constructs map states to states. These state changes are used to define the meanings of programs and program constructs. Some language constructs—for example, expressions—are mapped to values, not states.
表达式是大多数编程语言的基础。我们在此假设表达式没有副作用。此外,我们只处理非常简单的表达式:唯一的运算符是+和*,并且表达式最多可以有一个运算符;唯一的操作数是标量整数变量和整数文字;没有括号;表达式的值是整数。以下是这些表达式的 BNF 描述:
Expressions are fundamental to most programming languages. We assume here that expressions have no side effects. Furthermore, we deal with only very simple expressions: The only operators are + and *, and an expression can have at most one operator; the only operands are scalar integer variables and integer literals; there are no parentheses; and the value of an expression is an integer. Following is the BNF description of these expressions:
<表达式> → <十进制数|> <变量> <二进制表达式> <二进制表达式> → <左表达式> <运算符> <右表达式> <左表达式> → <十进制数> <变量> <右表达式> → <十进制数> <变量> <运算符> →|||+ | *
<expr> → <dec_num> | <var> | <binary_expr>
<binary_expr> → <left_expr> <operator> <right_expr>
<left_expr> → <dec_num> | <var>
<right_expr> → <dec_num> | <var>
<operator> → + | *
我们在表达式中考虑的唯一错误是变量具有未定义的值。显然,还可能发生其他错误,但大多数错误都与机器有关。设 Z 为整数集,设 为error错误值。然后
是我们表达式的形式规范的语义域。
The only error we consider in expressions is a variable having an undefined value. Obviously, other errors can occur, but most of them are machine-dependent. Let Z be the set of integers, and let error be the error value. Then
is the semantic domain for the denotational specification for our expressions.
给定表达式 E 和状态 s 的映射函数如下。为了区分数学函数定义和编程语言的赋值语句,我们使用符号
=定义数学函数。=>此定义中使用的蕴涵符号 将操作数的形式与其关联的 case(或 switch)结构连接起来。点符号用于引用节点的子节点。例如,<binary_expr>.<left_expr> 指的是 <binary_expr> 的左子节点。
The mapping function for a given expression E and state s follows. To distinguish between mathematical function definitions and the assignment statements of programming languages, we use the symbol
= to define mathematical functions. The implication symbol, =>, used in this definition connects the form of an operand with its associated case (or switch) construct. Dot notation is used to refer to the child nodes of a node. For example, <binary_expr>.<left_expr> refers to the left child node of <binary_expr>.
(<表达式>, s)
=案例 <expr> 的
<dec_num>=>
(<dec_num>, s )
<var>=>如果 VARMAP (<var>, s ) == undef
则错误
否则 VARMAP (<var>, s )
<binary_expr>=>
如果(
(<二进制表达式>.<左表达式>,s ) == undef或
(<binary_expr>.<right_expr>, s ) == undef)
then error
else if (<binary_expr>.<operator> == '+')
then
(<二进制表达式>.<左表达式>,s) +
(<二进制表达式>.<右表达式>, s )
else
(<二进制表达式>.<左表达式>,s) *
(<二进制表达式>.<右表达式>,s)
(<expr>, s)
= case <expr> of
<dec_num>=>
(<dec_num>, s)
<var> =>if VARMAP (<var>, s) == undef
then error
else VARMAP(<var>, s)
<binary_expr> =>
if(
(<binary_expr>.<left_expr>,s) == undef OR
(<binary_expr>.<right_expr>, s) == undef)
then error
else if (<binary_expr>.<operator> == '+')
then
(<binary_expr>.<left_expr>, s) +
(<binary_expr>.<right_expr>, s)
else
(<binary_expr>.<left_expr>, s) *
(<binary_expr>.<right_expr>, s)
赋值语句是表达式求值加上将目标变量设置为表达式的值。在这种情况下,意义函数将一个状态映射到另一个状态。该函数可以用以下内容描述:
An assignment statement is an expression evaluation plus the setting of the target variable to the expression’s value. In this case, the meaning function maps a state to a state. This function can be described with the following:
(xE =, s)
=如果
(E, s) == 错误
则错误
否则 s'= {<
>,<
>,...,<
>},其中
对于 j =1, 2, . . . , n,
如果
==然后
=
(E,)
否则
=变量映射(
, 年代)
(x = E, s)
= if
(E, s) == error
then error
else s' = {<
>, <
>, . . . , <
>}, where
for j = 1, 2, . . . , n
if
== x
then
=
(E, s)
else
= VARMAP(
, s)
请注意上面倒数第三行的比较,
==x,是名称,而不是值。
Note that the comparison in the third last line above,
== x, is of names, not values.
逻辑预测试循环的表示语义看似简单。为了加快讨论速度,我们假设存在另外两个现有的映射函数, 和 分别将语句列表和状态映射到状态,将布尔表达式映射到布尔值(或错误)。该函数是
The denotational semantics of a logical pretest loop is deceptively simple. To expedite the discussion, we assume that there are two other existing mapping functions, and that map statement lists and states to states and Boolean expressions to Boolean values (or error), respectively. The function is
(whileB doL,年代)
=如果
(B,s ) == undef
然后错误
否则如果
(B,s 为) ==假
则 s
否则
(L, s) == 错误
则错误
否则
(whileBL do,
(大写字母))
(while B do L, s)
= if
(B, s) == undef
then error
else if
(B, s) == false
then s
else if
(L, s) == error
then error
else
(while B do L,
(L, s))
循环的含义只是在循环中的语句执行了规定次数(假设没有错误)之后程序变量的值。本质上,循环已从迭代转换为递归,其中递归控制由其他递归状态映射函数以数学方式定义。与迭代相比,递归更容易用数学严谨性来描述。
The meaning of the loop is simply the value of the program variables after the statements in the loop have been executed the prescribed number of times, assuming there have been no errors. In essence, the loop has been converted from iteration to recursion, where the recursion control is mathematically defined by other recursive state mapping functions. Recursion is easier to describe with mathematical rigor than iteration.
此时一个重要的观察是,这个定义就像实际的程序循环一样,由于非终止而可能不计算任何内容。
One significant observation at this point is that this definition, like actual program loops, may compute nothing because of nontermination.
可以为编程语言的其他语法实体定义对象和函数(例如前面构造中使用的对象和函数)。当为给定语言定义了完整的系统后,它可用于确定该语言中完整程序的含义。这为以高度严格的方式思考编程提供了一个框架。
Objects and functions, such as those used in the earlier constructs, can be defined for the other syntactic entities of programming languages. When a complete system has been defined for a given language, it can be used to determine the meaning of complete programs in that language. This provides a framework for thinking about programming in a highly rigorous way.
如前所述,指称语义可以作为语言设计的辅助手段。例如,指称语义描述复杂且困难的语句可能向设计者表明,此类语句对于语言用户来说可能也难以理解,因此可能需要采用其他设计。
As stated previously, denotational semantics can be used as an aid to language design. For example, statements for which the denotational semantic description is complex and difficult may indicate to the designer that such statements may also be difficult for language users to understand and that an alternative design may be in order.
由于外延描述的复杂性,它们对语言使用者来说用处不大。另一方面,它们提供了一种简洁地描述语言的极好方法。
Because of the complexity of denotational descriptions, they are of little use to language users. On the other hand, they provide an excellent way to describe a language concisely.
尽管指称语义学的使用通常归功于Scott 和 Strachey (1971),但语言描述的一般指称方法可以追溯到 19 世纪 ( Frege, 1892 )。
Although the use of denotational semantics is normally attributed to Scott and Strachey (1971), the general denotational approach to language description can be traced to the nineteenth century (Frege, 1892).
人们已经做了大量工作来探究使用指称语言描述自动生成编译器的可能性(Jones,1980;Milos 等,1984;Bodwin 等,1982)。这些努力表明该方法是可行的,但工作尚未进展到可以用于生成有用的编译器的程度。
A significant amount of work has been done on the possibility of using denotational language descriptions to generate compilers automatically (Jones, 1980; Milos et al., 1984; Bodwin et al., 1982). These efforts have shown that the method is feasible, but the work has never progressed to the point where it can be used to generate useful compilers.
公理语义学之所以这样命名,是因为它基于数理逻辑,是本章讨论的语义规范的最抽象方法。公理语义学并不直接指定程序的含义,而是指定可以证明程序的内容。回想一下,语义规范的可能用途之一是证明程序的正确性。
Axiomatic semantics, thus named because it is based on mathematical logic, is the most abstract approach to semantics specification discussed in this chapter. Rather than directly specifying the meaning of a program, axiomatic semantics specifies what can be proven about the program. Recall that one of the possible uses of semantic specifications is to prove the correctness of programs.
在公理语义中,没有机器或程序状态的模型,也没有程序执行时发生的状态变化的模型。程序的含义基于程序变量和常量之间的关系,这些关系对于程序的每次执行都是相同的。
In axiomatic semantics, there is no model of the state of a machine or program or model of state changes that take place when the program is executed. The meaning of a program is based on relationships among program variables and constants, which are the same for every execution of the program.
公理语义有两个不同的应用:程序验证和程序语义规范。本节重点介绍程序验证中对公理语义的描述。
Axiomatic semantics has two distinct applications: program verification and program semantics specification. This section focuses on program verification in its description of axiomatic semantics.
公理语义是在开发一种证明程序正确性的方法时定义的。这种正确性证明(如果可以构造)表明程序执行了其规范中描述的计算。在证明中,程序的每个语句前面和后面都有一个逻辑表达式,该表达式指定了对程序变量的约束。这些(而不是抽象机器的整个状态(与操作语义一样))用于指定语句的含义。用于描述约束的符号(实际上是公理语义的语言)是谓词演算。虽然简单的布尔表达式通常足以表达约束,但在某些情况下却不够。
Axiomatic semantics was defined in conjunction with the development of an approach to proving the correctness of programs. Such correctness proofs, when they can be constructed, show that a program performs the computation described by its specification. In a proof, each statement of a program is both preceded and followed by a logical expression that specifies constraints on program variables. These, rather than the entire state of an abstract machine (as with operational semantics), are used to specify the meaning of the statement. The notation used to describe constraints—indeed, the language of axiomatic semantics—is predicate calculus. Although simple Boolean expressions are often adequate to express constraints, in some cases they are not.
当使用公理语义来正式指定语句的含义时,该含义由该语句对受其影响的数据的断言的影响来定义。
When axiomatic semantics is used to specify formally the meaning of a statement, the meaning is defined by the statement’s effect on assertions about the data affected by the statement.
公理语义中使用的逻辑表达式称为谓词或断言。程序语句之前的断言描述了程序中该点对程序变量的约束。语句之后的断言描述了执行语句后对这些变量(可能还有其他变量)的新约束。这些断言分别称为语句的先决条件和后置条件。对于两个相邻的语句,第一个语句的后置条件是第二个语句的先决条件。开发给定程序的公理描述或证明要求程序中的每个语句都具有先决条件和后置条件。
The logical expressions used in axiomatic semantics are called predicates, or assertions. An assertion immediately preceding a program statement describes the constraints on the program variables at that point in the program. An assertion immediately following a statement describes the new constraints on those variables (and possibly others) after execution of the statement. These assertions are called the precondition and postcondition, respectively, of the statement. For two adjacent statements, the postcondition of the first serves as the precondition of the second. Developing an axiomatic description or proof of a given program requires that every statement in the program has both a precondition and a postcondition.
在以下部分中,我们将从语句的先决条件是根据给定的后置条件计算得出的角度来检查断言,尽管可以从相反的角度考虑这些断言。我们假设所有变量都是整数类型。作为一个简单的示例,请考虑以下赋值语句和后置条件:
In the following sections, we examine assertions from the point of view that preconditions for statements are computed from given postconditions, although it is possible to consider these in the opposite sense. We assume all variables are integer type. As a simple example, consider the following assignment statement and postcondition:
sum = 2 * x + 1 {sum > 1}sum = 2 * x + 1 {sum > 1}
前提条件和后置条件断言用括号括起来,以区别于程序语句的部分。此语句的一个可能前提条件是 { x > 10}。
Precondition and postcondition assertions are presented in braces to distinguish them from parts of program statements. One possible precondition for this statement is {x > 10}.
在公理语义学中,特定语句的含义由其前提条件和后置条件定义。实际上,这两个断言精确地指定了执行该语句的效果。
In axiomatic semantics, the meaning of a specific statement is defined by its precondition and its postcondition. In effect, the two assertions specify precisely the effect of executing the statement.
在以下小节中,我们将重点介绍语句和程序的正确性证明,这是公理语义的常见用途。公理语义的更一般概念是用逻辑表达式精确地陈述语句和程序的含义。程序验证是语言公理描述的一种应用。
In the following subsections, we focus on correctness proofs of statements and programs, which is a common use of axiomatic semantics. The more general concept of axiomatic semantics is to state precisely the meaning of statements and programs in terms of logic expressions. Program verification is one application of axiomatic descriptions of languages.
最弱前提条件是保证相关后置条件有效性的限制最少的前提条件。例如,在第3.5.3.1节 中给出的语句和后置条件中,{ x > 10}、{ x > 50} 和 { x > 1000} 都是有效的前提条件。在这种情况下,所有前提条件中最弱的是 { x > 0}。
The weakest precondition is the least restrictive precondition that will guarantee the validity of the associated postcondition. For example, in the statement and postcondition given in Section 3.5.3.1, {x > 10}, {x > 50}, and {x > 1000} are all valid preconditions. The weakest of all preconditions in this case is {x > 0}.
如果可以根据某种语言中每种语句类型的最通用后置条件计算出最弱先决条件,那么计算这些先决条件的过程就可以简洁地描述该语言的语义。此外,还可以为该语言的程序构建正确性证明。程序证明首先使用程序执行结果的特征作为程序最后一条语句的后置条件。此后置条件与最后一条语句一起用于计算最后一条语句的最弱先决条件。然后,将此先决条件用作倒数第二条语句的后置条件。此过程一直持续,直到到达程序的开头。此时,第一条语句的先决条件说明程序将在何种条件下计算所需结果。如果程序的输入规范暗示了这些条件,则程序已被验证为正确。
If the weakest precondition can be computed from the most general postcondition for each of the statement types of a language, then the processes used to compute these preconditions provide a concise description of the semantics of that language. Furthermore, correctness proofs can be constructed for programs in that language. A program proof is begun by using the characteristics of the results of the program’s execution as the postcondition of the last statement of the program. This postcondition, along with the last statement, is used to compute the weakest precondition for the last statement. This precondition is then used as the postcondition for the second last statement. This process continues until the beginning of the program is reached. At that point, the precondition of the first statement states the conditions under which the program will compute the desired results. If these conditions are implied by the input specification of the program, the program has been verified to be correct.
推理规则是一种根据其他断言的值推断一个断言的真实性的方法。推理规则的一般形式如下:
An inference rule is a method of inferring the truth of one assertion on the basis of the values of other assertions. The general form of an inference rule is as follows:
该规则规定,如果 S1、S2、... 和 S n为真,则可以推断出 S 的真实性。推理规则的上半部分称为其前件;下半部分称为其后件。
This rule states that if S1, S2, . . . , and Sn are true, then the truth of S can be inferred. The top part of an inference rule is called its antecedent; the bottom part is called its consequent.
公理是假定为真的逻辑陈述。因此,公理是没有前提的推理规则。
An axiom is a logical statement that is assumed to be true. Therefore, an axiom is an inference rule without an antecedent.
对于某些程序语句,从语句和后置条件计算最弱前置条件比较简单,可以用公理来指定。然而,在大多数情况下,最弱前置条件只能通过推理规则来指定。
For some program statements, the computation of a weakest precondition from the statement and a postcondition is simple and can be specified by an axiom. In most cases, however, the weakest precondition can be specified only by an inference rule.
要将公理语义与给定的编程语言结合使用,无论是为了正确性证明还是为了形式语义规范,语言中每种语句都必须存在一个公理或推理规则。在以下小节中,我们给出了一个赋值公理语句序列、选择语句和逻辑预测试循环语句的语句和推理规则。请注意,我们假设算术表达式和布尔表达式都没有副作用。
To use axiomatic semantics with a given programming language, whether for correctness proofs or for formal semantics specifications, either an axiom or an inference rule must exist for each kind of statement in the language. In the following subsections, we present an axiom for assignment statements and inference rules for statement sequences, selection statements, and logical pretest loop statements. Note that we assume that neither arithmetic nor Boolean expressions have side effects.
赋值语句的前提条件和后置条件共同定义了它的含义。要定义赋值语句的含义,必须有一种从其后置条件计算其前提条件的方法。
The precondition and postcondition of an assignment statement together define its meaning. To define the meaning of an assignment statement there must be a way to compute its precondition from its postcondition.
假设 x =E 是一般赋值语句,Q 是其后置条件。那么,其最弱的前置条件 P 由以下公理定义
Let x = E be a general assignment statement and Q be its postcondition. Then, its weakest precondition, P, is defined by the axiom
P
Qx→E
P
Qx→E
这意味着 P 被计算为 Q,其中 x 的所有实例都被 E 替换。例如,如果我们有赋值语句和后置条件
which means that P is computed as Q with all instances of x replaced by E. For example, if we have the assignment statement and postcondition
a = b / 2 - 1 {a < 10}a = b / 2 - 1 {a < 10}
最弱的先决条件是通过在后置条件 { } 中代入b / 2 - 1来计算的,如下所示:aa < 10
the weakest precondition is computed by substituting b / 2 - 1 for a in the postcondition {a < 10}, as follows:
b / 2 - 1 < 10
b < 22
b / 2 - 1 < 10
b < 22
因此,给定赋值语句和后置条件的最弱前提条件是 { b < 22}。请记住,只有在没有副作用的情况下,赋值公理才保证正确。如果赋值语句更改了目标以外的某个变量,则会产生副作用。
Thus, the weakest precondition for the given assignment statement and postcondition is {b < 22}. Remember that the assignment axiom is guaranteed to be correct only in the absence of side effects. An assignment statement has a side effect if it changes some variable other than its target.
指定给定语句形式的公理语义的通常符号是
The usual notation for specifying the axiomatic semantics of a given statement form is
其中 P 是前提条件,Q 是后置条件,S 是语句形式。对于赋值语句,符号为
where P is the precondition, Q is the postcondition, and S is the statement form. In the case of the assignment statement, the notation is
{Qx→E}xEQ={}
{Qx→E} x = E{Q}
作为计算赋值语句的先决条件的另一个示例,请考虑以下内容:
As another example of computing a precondition for an assignment statement, consider the following:
x = 2 * y - 3 {x > 25}x = 2 * y - 3 {x > 25}
先决条件计算如下:
The precondition is computed as follows:
2 * y - 3 > 25
y > 142 * y - 3 > 25
y > 14
所以 { y > 14} 是这个赋值语句和后置条件的最弱的先决条件。
So {y > 14} is the weakest precondition for this assignment statement and postcondition.
请注意,赋值语句左侧出现在其右侧并不影响计算最弱前提条件的过程。例如,对于
Note that the appearance of the left side of the assignment statement in its right side does not affect the process of computing the weakest precondition. For example, for
x = x + y - 3 {x > 10}x = x + y - 3 {x > 10}
最弱的先决条件是
the weakest precondition is
x + y - 3 > 10
y > 13 - x
x + y - 3 > 10
y > 13 - x
回想一下,公理语义是为了证明程序的正确性而开发的。鉴于此,此时很自然地想知道赋值语句的公理如何用于证明任何事情。方法如下:具有先决条件和后置条件的给定赋值语句可被视为逻辑语句或定理。如果赋值公理应用于后置条件和赋值语句时产生给定的先决条件,则该定理得到证明。例如,考虑以下逻辑语句:
Recall that axiomatic semantics was developed to prove the correctness of programs. In light of that, it is natural at this point to wonder how the axiom for assignment statements can be used to prove anything. Here is how: A given assignment statement with both a precondition and a postcondition can be considered a logical statement, or theorem. If the assignment axiom, when applied to the postcondition and the assignment statement, produces the given precondition, the theorem is proved. For example, consider the following logical statement:
{x > 3} x = x - 3 {x > 0}{x > 3} x = x - 3 {x > 0}
对语句及其后置条件使用赋值公理可得出 { x > 3},这是给定的前提条件。因此,我们已证明示例逻辑语句。
Using the assignment axiom on the statement and its postcondition produces {x > 3}, which is the given precondition. Therefore, we have proven the example logical statement.
接下来,考虑以下逻辑陈述:
Next, consider the following logical statement:
{x > 5} x = x - 3 {x > 0}{x > 5} x = x - 3 {x > 0}
在这种情况下,给定的前提条件 { x > 5} 与公理产生的断言不同。但是,很明显 { x > 5} 蕴含着 { x > 3}。要在证明中使用它,需要一个名为后果规则的推理规则。后果规则的形式为
In this case, the given precondition, {x > 5}, is not the same as the assertion produced by the axiom. However, it is obvious that {x > 5} implies {x > 3}. To use this in a proof, an inference rule named the rule of consequence is needed. The form of the rule of consequence is
该=>符号表示“隐含”,S 可以是任何程序语句。规则可以表述如下:如果逻辑语句
是正确的,断言
意味着断言 P,而断言 Q 意味着断言
那么可以推断
换句话说,结果规则表明后置条件总是可以减弱,而前置条件总是可以加强。这在程序证明中非常有用。例如,它允许完成上面最后一个逻辑语句示例的证明。如果我们让 P 为 { x > 3},Q 和
为 { x > 0},并且
是 { x > 5},我们有
The => symbol means “implies,” and S can be any program statement. The rule can be stated as follows: If the logical statement
is true, the assertion
implies the assertion P, and the assertion Q implies the assertion
then it can be inferred that
In other words, the rule of consequence says that a postcondition can always be weakened and a precondition can always be strengthened. This is quite useful in program proofs. For example, it allows the completion of the proof of the last logical statement example above. If we let P be {x > 3}, Q and
be {x > 0}, and
be {x > 5}, we have
前件()的第一项{x > 3} x = x - 3 {x > 0}已用赋值公理证明。第二项和第三项是显而易见的。因此,根据结果规则,后件为真。
The first term of the antecedent ({x > 3} x = x - 3 {x > 0}) was proven with the assignment axiom. The second and third terms are obvious. Therefore, by the rule of consequence, the consequent is true.
语句序列的最弱前提条件不能用公理来描述,因为前提条件取决于序列中语句的具体类型。在这种情况下,前提条件只能用推理规则来描述。假设 S1 和 S2 是相邻的程序语句。如果 S1 和 S2 具有以下前提条件和后置条件
The weakest precondition for a sequence of statements cannot be described by an axiom, because the precondition depends on the particular kinds of statements in the sequence. In this case, the precondition can only be described with an inference rule. Let S1 and S2 be adjacent program statements. If S1 and S2 have the following pre- and postconditions
此类双语句序列的推理规则是
the inference rule for such a two-statement sequence is
因此,对于我们的例子来说, 描述序列 S1;S2 的公理语义。推理规则指出,要获得序列前提条件,必须计算第二个语句的前提条件。然后,将此新断言用作第一个语句的后置条件,然后可以使用该后置条件来计算第一个语句的前提条件,这也是整个序列的前提条件。如果 S1 和 S2 是赋值语句
So, for our example, describes the axiomatic semantics of the sequence S1; S2. The inference rule states that to get the sequence precondition, the precondition of the second statement is computed. This new assertion is then used as the postcondition of the first statement, which can then be used to compute the precondition of the first statement, which is also the precondition of the whole sequence. If S1 and S2 are the assignment statements
和
and
那么我们有
then we have
因此,对于序列 x1 =E1; x2 =E2,在后置条件 P3 下,最弱的前置条件是
Therefore, the weakest precondition for the sequence x1 = E1; x2 = E2 with postcondition P3 is
例如,考虑以下序列和后置条件:
For example, consider the following sequence and postcondition:
y = 3 * x + 1;
x = y + 3;
{x < 10}
y = 3 * x + 1;
x = y + 3;
{x < 10}
第二条赋值语句的前提条件是
The precondition for the second assignment statement is
y < 7y < 7
它用作第一个语句的后置条件。现在可以计算第一个赋值语句的前置条件:
which is used as the postcondition for the first statement. The precondition for the first assignment statement can now be computed:
3 * x + 1 < 7
x < 2
3 * x + 1 < 7
x < 2
所以,{x < 2}是第一个语句和两个语句序列的先决条件。
So, {x < 2} is the precondition of both the first statement and the two-statement sequence.
接下来我们考虑选择语句的推理规则,其一般形式为
We next consider the inference rule for selection statements, the general form of which is
if乙thenelse
if B then S1 else S2
我们只考虑包含else子句的选择。推理规则是
We consider only selections that include else clauses. The inference rule is
该规则规定,选择语句必须在布尔控制表达式为真和为假时都得到证明。该线上方的第一个逻辑语句代表子句then;第二个逻辑语句代表else子句。根据推理规则,我们需要一个前提条件 P,它可以用于then和else子句的前提条件中。
This rule specifies that selection statements must be proven both when the Boolean control expression is true and when it is false. The first logical statement above the line represents the then clause; the second represents the else clause. According to the inference rule, we need a precondition P that can be used in the precondition of both the then and else clauses.
考虑以下使用选择推理规则计算前提条件的示例。示例选择语句为
Consider the following example of the computation of the precondition using the selection inference rule. The example selection statement is
if x > 0 then
y = y - 1
else
y = y + 1
if x > 0 then
y = y - 1
else
y = y + 1
假设此选择语句的后置条件 Q 为 { }。我们可以在子句y > 0上使用赋值公理then
Suppose the postcondition, Q, for this selection statement is {y > 0}. We can use the axiom for assignment on the then clause
y = y - 1 {y > 0}y = y - 1 {y > 0}
这会产生 { y - 1 > 0} 或 { y > 1}。它可以用作then子句前提条件的 P 部分。现在我们将相同的公理应用于else子句
This produces {y - 1 > 0} or {y > 1}. It can be used as the P part of the precondition for the then clause. Now we apply the same axiom to the else clause
y = y + 1 {y > 0}y = y + 1 {y > 0}
产生前提条件 { y + 1 > 0} 或 { y > -1}。因为 { y > 1} => { y > -1},结果规则允许我们使用 { y > 1} 作为整个选择语句的前提条件。
which produces the precondition {y + 1 > 0} or {y > -1}. Because {y > 1} => {y > -1}, the rule of consequence allows us to use {y > 1} for the precondition of the whole selection statement.
命令式编程语言的另一个基本构造是逻辑预测试或while循环。计算循环的最弱先决条件while本质上比序列更困难,因为迭代次数并不总是可以预先确定的。在迭代次数已知的情况下,可以展开循环并将其视为序列。
Another essential construct of imperative programming languages is the logical pretest, or while loop. Computing the weakest precondition for a while loop is inherently more difficult than for a sequence, because the number of iterations cannot always be predetermined. In a case where the number of iterations is known, the loop can be unrolled and treated as a sequence.
计算循环的最弱前提条件的问题类似于证明关于所有正整数的定理的问题。在后一种情况下,通常使用归纳法,并且相同的归纳方法可用于某些循环。归纳法的主要步骤是找到归纳假设。循环的公理语义中的相应步骤while是找到一个称为循环不变量的断言,这对于找到最弱前提条件至关重要。
The problem of computing the weakest precondition for loops is similar to the problem of proving a theorem about all positive integers. In the latter case, induction is normally used, and the same inductive method can be used for some loops. The principal step in induction is finding an inductive hypothesis. The corresponding step in the axiomatic semantics of a while loop is finding an assertion called a loop invariant, which is crucial to finding the weakest precondition.
计算循环前提条件的推理规则while如下:
The inference rule for computing the precondition for a while loop is as follows:
在这个规则中,I 是循环不变量。这看似简单,实则不然。复杂性在于找到合适的循环不变量。
In this rule, I is the loop invariant. This seems simple, but it is not. The complexity lies in finding an appropriate loop invariant.
循环的公理描述while如下
The axiomatic description of a while loop is written as
{P}while基础do{ endQ}
{P} while B do S end {Q}
循环不变量必须满足许多要求才有用。首先,循环的最弱前提条件while必须保证循环不变量的真实性。反过来,循环不变量必须保证循环终止时后置条件的真实性。这些约束使我们从推理规则转向公理描述。在循环执行期间,循环不变量的真实性必须不受循环控制布尔表达式和循环体语句的求值的影响。因此,名称为不变量。
The loop invariant must satisfy a number of requirements to be useful. First, the weakest precondition for the while loop must guarantee the truth of the loop invariant. In turn, the loop invariant must guarantee the truth of the postcondition upon loop termination. These constraints move us from the inference rule to the axiomatic description. During execution of the loop, the truth of the loop invariant must be unaffected by the evaluation of the loop-controlling Boolean expression and the loop body statements. Hence, the name invariant.
循环的另一个复杂因素while是循环终止问题。不终止的循环不可能是正确的,实际上什么也不计算。如果 Q 是循环退出后立即成立的后置条件,那么循环的前置条件 P 就是保证循环退出时 Q 成立并且保证循环终止的前置条件。
Another complicating factor for while loops is the question of loop termination. A loop that does not terminate cannot be correct, and in fact computes nothing. If Q is the postcondition that holds immediately after loop exit, then a precondition P for the loop is one that guarantees Q at loop exit and also guarantees that the loop terminates.
构造的完整公理描述while要求以下所有条件均为真,其中 I 是循环不变量:
The complete axiomatic description of a while construct requires all of the following to be true, in which I is the loop invariant:
P =>I
{I 和 B} S { I }
(I 和 (非 B)) =>Q
循环终止
P => I
{I and B} S { I }
(I and (not B)) => Q
the loop terminates
如果循环计算数值序列,则可以使用确定归纳假设的方法找到循环不变量,该方法用于在使用数学归纳法证明有关数学序列的陈述时确定归纳假设。在少数情况下计算迭代次数与循环主体的先决条件之间的关系,希望出现适用于一般情况的模式。将产生最弱先决条件的过程视为函数 wp 会很有帮助。一般来说
If a loop computes a sequence of numeric values, it may be possible to find a loop invariant using an approach that is used for determining the inductive hypothesis when mathematical induction is used to prove a statement about a mathematical sequence. The relationship between the number of iterations and the precondition for the loop body is computed for a few cases, with the hope that a pattern emerges that will apply to the general case. It is helpful to treat the process of producing a weakest precondition as a function, wp. In general
wp 函数通常被称为谓词转换器,因为它以谓词或断言作为参数并返回另一个谓词。
A wp function is often called a predicate transformer, because it takes a predicate, or assertion, as a parameter and returns another predicate.
为了找到 I,循环后置条件 Q 用于计算循环体几次不同迭代次数的先决条件,从零开始。如果循环体包含单个赋值语句,则可以使用赋值语句的公理来计算这些情况。考虑示例循环:
To find I, the loop postcondition Q is used to compute preconditions for several different numbers of iterations of the loop body, starting with none. If the loop body contains a single assignment statement, the axiom for assignment statements can be used to compute these cases. Consider the example loop:
while y <> x do y = y + 1 end {y = x}while y <> x do y = y + 1 end {y = x}
请记住,等号在这里有两个不同的用途。在断言中,它表示数学相等;在断言之外,它表示赋值运算符。
Remember that the equal sign is being used for two different purposes here. In assertions, it means mathematical equality; outside assertions, it means the assignment operator.
对于零次迭代,最弱的先决条件显然是,
For zero iterations, the weakest precondition is, obviously,
{y = x}{y = x}
对于一次迭代,
For one iteration, it is
wp(y = y + 1, {y = x}) = {y + 1 = x},或{y = x - 1}
wp(y = y + 1, {y = x}) = {y + 1 = x}, or {y = x - 1}
对于两次迭代,
For two iterations, it is
wp(y = y + 1, {y = x - 1})={y + 1 = x - 1},或{y = x - 2}
wp(y = y + 1, {y = x - 1})={y + 1 = x - 1}, or {y = x - 2}
对于三次迭代,
For three iterations, it is
wp(y = y + 1, {y = x - 2})={y + 1 = x - 2},或{y = x – 3}
wp(y = y + 1, {y = x - 2})={y + 1 = x - 2}, or {y = x – 3}
现在很明显,y < x对于一次或多次迭代的情况,{ } 就足够了。将它与y = x零次迭代情况的 { } 相结合,我们得到 { y <= x},它可用于循环不变量。语句的先决条件while可以从循环不变量中确定。事实上,I 可以用作先决条件 P。
It is now obvious that {y < x} will suffice for cases of one or more iterations. Combining this with {y = x} for the zero iterations case, we get {y <= x}, which can be used for the loop invariant. A precondition for the while statement can be determined from the loop invariant. In fact, I can be used as the precondition, P.
我们必须确保我们的选择满足示例循环中 I 的四个标准。首先,因为 P =I,P =>I。第二个要求是必须满足以下条件:
We must ensure that our choice satisfies the four criteria for I for our example loop. First, because P = I, P => I. The second requirement is that the following must be true:
在我们的例子中,我们有
In our example, we have
{y <= x and y <> x} y = y + 1 {y <= x}{y <= x and y <> x} y = y + 1 {y <= x}
将分配公理应用于
Applying the assignment axiom to
y = y + 1 {y <= x}y = y + 1 {y <= x}
我们得到 { y + 1 <= x},它等价于 { y < x},这是由 {y <= x和y <> x} 隐含的。因此,前面的陈述得到证明。
we get {y + 1 <= x}, which is equivalent to {y < x}, which is implied by {y <= x and y <> x}. So, the earlier statement is proven.
接下来我们必须
Next, we must have
{I 且 (非 B)} =>Q
{I and (not B)} => Q
在我们的例子中,我们有
In our example, we have
{(y <= x)而不是(y <> x)} => {y = x}{(y <= x)和(y = x)} => {y = x}{y = x} => {y = x}
{(y <= x) and not (y <> x)} => {y = x}{(y <= x) and (y = x)} => {y = x}{y = x} => {y = x}
所以,这显然是正确的。接下来,必须考虑循环终止。在这个例子中,问题是循环是否
So, this is obviously true. Next, loop termination must be considered. In this example, the question is whether the loop
{y <= x} while y <> x do y = y + 1 end {y = x}{y <= x} while y <> x do y = y + 1 end {y = x}
终止。回想一下x和y被假设为整数变量,很容易看出这个循环确实会终止。前提条件保证y最初不大于x。循环体y随着每次迭代而递增,直到y等于。无论最初比x小多少,它最终都会等于。所以循环会终止。因为我们选择的 I 满足所有四个标准,所以它是一个令人满意的循环不变量和循环前提条件。yxx
terminates. Recalling that x and y are assumed to be integer variables, it is easy to see that this loop does terminate. The precondition guarantees that y initially is not larger than x. The loop body increments y with each iteration, until y is equal to x. No matter how much smaller y is than x initially, it will eventually become equal to x. So the loop will terminate. Because our choice of I satisfies all four criteria, it is a satisfactory loop invariant and loop precondition.
先前用于计算循环不变量的过程并不总是产生最弱的先决条件的断言(尽管在示例中确实如此)。
The previous process used to compute the invariant for a loop does not always produce an assertion that is the weakest precondition (although it does in the example).
作为使用数学归纳法寻找循环不变量的另一个例子,请考虑以下循环语句:
As another example of finding a loop invariant using the approach used in mathematical induction, consider the following loop statement:
while s > 1 do s = s / 2 end {s = 1}while s > 1 do s = s / 2 end {s = 1}
与之前一样,我们使用赋值公理来尝试找到循环不变量和循环的先决条件。对于零次迭代,最弱的先决条件是 { s = 1}。
As before, we use the assignment axiom to try to find a loop invariant and a precondition for the loop. For zero iterations, the weakest precondition is {s = 1}.
对于一次迭代,
For one iteration, it is
wp (s = s / 2, {s = 1}) = {s / 2 = 1},或{s = 2}
wp(s = s / 2, {s = 1}) = {s / 2 = 1}, or {s = 2}
对于两次迭代,
For two iterations, it is
wp (s = s / 2, {s = 2}) = {s / 2 = 2},或{s = 4}
wp(s = s / 2, {s = 2}) = {s / 2 = 2}, or {s = 4}
对于三次迭代,
For three iterations, it is
wp (s = s / 2, {s = 4}) = {s / 2 = 4},或{s = 8}
wp(s = s / 2, {s = 4}) = {s / 2 = 4}, or {s = 8}
从这些案例中,我们可以清楚地看到,不变量是
From these cases, we can see clearly that the invariant is
{是}s的非负幂2
{s is a nonnegative power of 2}
再次,计算出的 I 可以用作 P,并且 I 满足四个要求。与我们之前查找循环前提条件的示例不同,这个显然不是最弱前提条件。考虑使用前提条件 { s > 1}。逻辑语句
Once again, the computed I can serve as P, and I passes the four requirements. Unlike our earlier example of finding a loop precondition, this one clearly is not a weakest precondition. Consider using the precondition {s > 1}. The logical statement
{s > 1} while s > 1 do s = s / 2 end {s = 1}{s > 1} while s > 1 do s = s / 2 end {s = 1}
很容易证明,而且这个前提条件比之前计算的前提条件要宽泛得多。循环和前提条件对于 的任何正值都成立s,而不仅仅是 2 的幂,正如过程所示。由于后果规则,使用比最弱前提条件更强的前提条件不会使证明无效。
can easily be proven, and this precondition is significantly broader than the one computed earlier. The loop and precondition are satisfied for any positive value for s, not just powers of 2, as the process indicates. Because of the rule of consequence, using a precondition that is stronger than the weakest precondition does not invalidate a proof.
寻找循环不变量并不总是那么容易。了解这些不变量的性质会很有帮助。首先,循环不变量是循环后置条件的弱化版本,也是循环的先决条件。因此,I 必须足够弱才能在循环执行开始之前得到满足,但当与循环退出条件结合时,它必须足够强才能强制后置条件为真。
Finding loop invariants is not always easy. It is helpful to understand the nature of these invariants. First, a loop invariant is a weakened version of the loop postcondition and also a precondition for the loop. So, I must be weak enough to be satisfied prior to the beginning of loop execution, but when combined with the loop exit condition, it must be strong enough to force the truth of the postcondition.
由于证明循环终止的难度,该要求经常被忽略。如果可以证明循环终止,则循环的公理描述称为完全正确性。如果可以满足其他条件但不能保证终止,则称为部分正确性。
Because of the difficulty of proving loop termination, that requirement is often ignored. If loop termination can be shown, the axiomatic description of the loop is called total correctness. If the other conditions can be met but termination is not guaranteed, it is called partial correctness.
在更复杂的循环中,找到合适的循环不变量,即使是为了部分正确性,也需要很大的创造力。由于计算循环的前提条件while取决于找到循环不变量,因此使用公理语义证明带有循环的程序的正确性while可能很困难。
In more complex loops, finding a suitable loop invariant, even for partial correctness, requires a good deal of ingenuity. Because computing the precondition for a while loop depends on finding a loop invariant, proving the correctness of programs with while loops using axiomatic semantics can be difficult.
本节提供两个简单程序的验证。正确性证明的第一个示例是一个非常短的程序,由三个赋值语句序列组成,这些赋值语句交换两个变量的值。
This section provides validations for two simple programs. The first example of a correctness proof is for a very short program, consisting of a sequence of three assignment statements that interchange the values of two variables.
{x = A AND y = B}
t = x;
x = y;
y = t;
{x = B AND y = A}
{x = A AND y = B}
t = x;
x = y;
y = t;
{x = B AND y = A}
由于该程序完全由序列中的赋值语句组成,因此可以使用赋值公理和序列推理规则来证明其正确性。第一步是将赋值公理应用于最后一个语句,并将后置条件应用于整个程序。这得出前提条件
Because the program consists entirely of assignment statements in a sequence, the assignment axiom and the inference rule for sequences can be used to prove its correctness. The first step is to use the assignment axiom on the last statement and the postcondition for the whole program. This yields the precondition
{x = B AND t = A}{x = B AND t = A}
接下来,我们将这个新的前提条件用作中间语句的后置条件,并计算其前提条件,即
Next, we use this new precondition as a postcondition on the middle statement and compute its precondition, which is
{y = B AND t = A}{y = B AND t = A}
接下来,我们将这个新的断言用作第一个语句的后置条件,并应用赋值公理,结果如下
Next, we use this new assertion as the postcondition on the first statement and apply the assignment axiom, which yields
{y = B AND x = A}{y = B AND x = A}
这与程序中的前提条件相同,只是运算符上的操作数顺序不同AND。由于AND是对称运算符,因此我们的证明是完整的。
which is the same as the precondition on the program, except for the order of operands on the AND operator. Because AND is a symmetric operator, our proof is complete.
下面的例子是计算阶乘函数的伪代码程序正确性的证明。
The following example is a proof of correctness of a pseudocode program that computes the factorial function.
{n >= 0}
count = n;
fact = 1;
while count <> 0 do
fact = fact * count;
count = count - 1;
end
{fact = n!}
{n >= 0}
count = n;
fact = 1;
while count <> 0 do
fact = fact * count;
count = count - 1;
end
{fact = n!}
前面描述的查找循环不变量的方法不适用于本例中的循环。这里需要一些聪明才智,可以通过对代码的简单研究来提供帮助。循环按最后乘法的顺序计算阶乘函数;也就是说,首先(n - 1) * n进行,假设n大于1。因此,不变量的一部分可以是以下内容:
The method described earlier for finding the loop invariant does not work for the loop in this example. Some ingenuity is required here, which can be aided by a brief study of the code. The loop computes the factorial function in order of the last multiplication first; that is, (n - 1) * n is done first, assuming n is greater than 1. So, part of the invariant can be the following:
fact = (count + 1) * (count + 2) * . . . * (n - 1) * nfact = (count + 1) * (count + 2) * . . . * (n - 1) * n
但我们还必须确保count始终是非负的,我们可以通过将其添加到上面的断言中来实现,以获得以下内容:
But we also must ensure that count is always nonnegative, which we can do by adding that to the assertion above, to get the following:
I = (fact = (count + 1) * . . . * n) AND (count >= 0)I = (fact = (count + 1) * . . . * n) AND (count >= 0)
接下来,我们必须确认这个 I 满足不变量的要求。我们再次让 I 也用于 P,因此 P 显然蕴含着 I。下一个问题是
Next, we must confirm that this I meets the requirements for invariants. Once again we let I also be used for P, so P clearly implies I. The next question is
I 和 B 如下:
I and B is the following:
((fact = (count + 1) * . . . * n) AND (count >= 0)) AND
(count <> 0)((fact = (count + 1) * . . . * n) AND (count >= 0)) AND
(count <> 0)
这简化为
This reduces to
(fact = (count + 1) * . . . * n) AND (count > 0)(fact = (count + 1) * . . . * n) AND (count > 0)
在我们的例子中,我们必须使用后置条件的不变量来计算循环体的先决条件。对于
In our case, we must compute the precondition of the body of the loop, using the invariant for the postcondition. For
{P}count = count - 1 {I}
{P} count = count - 1 {I}
我们计算 P 为
we compute P to be
{(fact = count * (count + 1) * . . . * n) AND
(count >= 1)}{(fact = count * (count + 1) * . . . * n) AND
(count >= 1)}
使用它作为循环体中第一个赋值的后置条件,
Using this as the postcondition for the first assignment in the loop body,
{磷} fact = fact * count {(fact = count * (count + 1)
* . . . * n) AND (count >= 1)}
{P} fact = fact * count {(fact = count * (count + 1)
* . . . * n) AND (count >= 1)}
在这种情况下,P 是
In this case, P is
{(fact = (count + 1) * . . . * n) AND (count >= 1)}{(fact = (count + 1) * . . . * n) AND (count >= 1)}
显然,I 和 B 蕴涵这个 P,因此根据推论规则,
It is clear that I and B implies this P, so by the rule of consequence,
{肠AND易激综合症 { I}
{I AND B} S {I}
是真的。最后,I 的最后一个测试是
is true. Finally, the last test of I is
我AND( NOTB ) =>Q
I AND (NOT B) => Q
对于我们的例子来说,这是
For our example, this is
((fact = (count + 1) * . . . * n) AND (count >= 0)) AND
(count = 0)) => fact = n!
((fact = (count + 1) * . . . * n) AND (count >= 0)) AND
(count = 0)) => fact = n!
这显然是正确的,因为当 时count = 0,第一部分恰好是阶乘的定义。因此,我们对 I 的选择满足循环不变量的要求。现在我们可以将 中的 P(与 I 相同)用作while程序第二个赋值的后置条件
This is clearly true, for when count = 0, the first part is precisely the definition of factorial. So, our choice of I meets the requirements for a loop invariant. Now we can use our P (which is the same as I) from the while as the postcondition on the second assignment of the program
{P} fact = 1 {(fact = (count + 1) * . . . * n) AND
(count >= 0)}{P} fact = 1 {(fact = (count + 1) * . . . * n) AND
(count >= 0)}
对于 P 得出
which yields for P
(1 = (count + 1) * . . . * n) AND (count >= 0))(1 = (count + 1) * . . . * n) AND (count >= 0))
使用它作为代码中第一个赋值的后置条件
Using this as the postcondition for the first assignment in the code
{P} count = n {(1 = (count + 1) * . . . * n) AND
(count >= 0))}{P} count = n {(1 = (count + 1) * . . . * n) AND
(count >= 0))}
为 P 生产
produces for P
{(n + 1) * . . . * n = 1) AND (n >= 0)}{(n + 1) * . . . * n = 1) AND (n >= 0)}
运算符的左操作数AND为真(因为1 = 1),而右操作数恰好是整个代码段的前提条件,{ n >= 0}。因此,程序被证明是正确的。
The left operand of the AND operator is true (because 1 = 1) and the right operand is exactly the precondition of the whole code segment, {n >= 0}. Therefore, the program has been proven to be correct.
如前所述,要使用公理方法定义完整编程语言的语义,语言中每种语句类型都必须有一个公理或推理规则。为编程语言的某些语句定义公理或推理规则已被证明是一项艰巨的任务。解决此问题的一个明显方法是考虑公理方法设计语言,以便仅包含可以编写公理或推理规则的语句。不幸的是,这样的语言必然会遗漏一些有用且强大的语句。
As stated previously, to define the semantics of a complete programming language using the axiomatic method, there must be an axiom or an inference rule for each statement type in the language. Defining axioms or inference rules for some of the statements of programming languages has proven to be a difficult task. An obvious solution to this problem is to design the language with the axiomatic method in mind, so that only statements for which axioms or inference rules can be written are included. Unfortunately, such a language would necessarily leave out some useful and powerful statements.
公理语义是研究程序正确性证明的强大工具,它提供了一个极好的框架,可以在程序构建期间和构建之后对其进行推理。然而,它在向语言用户和编译器编写者描述编程语言含义方面的用处非常有限。
Axiomatic semantics is a powerful tool for research into program correctness proofs, and it provides an excellent framework in which to reason about programs, both during their construction and later. Its usefulness in describing the meaning of programming languages to language users and compiler writers is, however, highly limited.
巴科斯范式和上下文无关语法是等效的元语言,非常适合描述编程语言的语法。它们不仅是简洁的描述工具,而且与其生成动作相关联的解析树还可以提供底层句法结构的图形证据。此外,它们与它们生成的语言的识别设备自然相关,这使得为这些语言的编译器构建语法分析器相对容易。
Backus-Naur Form and context-free grammars are equivalent metalanguages that are well suited for the task of describing the syntax of programming languages. Not only are they concise descriptive tools, but also the parse trees that can be associated with their generative actions give graphical evidence of the underlying syntactic structures. Furthermore, they are naturally related to recognition devices for the languages they generate, which leads to the relatively easy construction of syntax analyzers for compilers for these languages.
属性语法是一种描述形式,可以描述语言的语法和静态语义。属性语法是上下文无关语法的扩展。属性语法由语法、一组属性、一组属性计算函数和一组描述静态语义规则的谓词组成。
An attribute grammar is a descriptive formalism that can describe both the syntax and static semantics of a language. Attribute grammars are extensions to context-free grammars. An attribute grammar consists of a grammar, a set of attributes, a set of attribute computation functions, and a set of predicates that describe static semantics rules.
本章简要介绍了三种语义描述方法:操作语义、外延语义和公理语义。操作语义是一种根据语言结构对理想机器的影响来描述语言结构含义的方法。在外延语义中,数学对象用于表示语言结构的含义。语言实体通过递归函数转换为这些数学对象。公理语义基于形式逻辑,被设计为证明程序正确性的工具。
This chapter provides brief introductions to three methods of semantic description: operational, denotational, and axiomatic. Operational semantics is a method of describing the meaning of language constructs in terms of their effects on an ideal machine. In denotational semantics, mathematical objects are used to represent the meanings of language constructs. Language entities are converted to these mathematical objects with recursive functions. Axiomatic semantics, which is based on formal logic, was devised as a tool for proving the correctness of programs.
Cleaveland 和 Uzgalis (1976)详细讨论了使用上下文无关文法和 BNF 的语法描述。
Syntax description using context-free grammars and BNF are thoroughly discussed in Cleaveland and Uzgalis (1976).
公理语义学的研究始于Floyd (1967),并由Hoare (1969)进一步发展。Hoare和 Wirth (1973)使用此方法描述了 Pascal 很大一部分的语义。他们未完成的部分涉及功能副作用和 goto 语句。这些被发现是最难描述的。
Research in axiomatic semantics was begun by Floyd (1967) and further developed by Hoare (1969). The semantics of a large part of Pascal was described by Hoare and Wirth (1973) using this method. The parts they did not complete involved functional side effects and goto statements. These were found to be the most difficult to describe.
Dijkstra (1976)描述(并提倡)在程序开发过程中使用前提条件和后置条件的技术,Gries (1981)也对其进行了详细讨论。
The technique of using preconditions and postconditions during the development of programs is described (and advocated) by Dijkstra (1976) and also discussed in detail in Gries (1981).
Gordon (1979)和Stoy (1977) 的著作对指称语义学有很好的介绍。 Marcotty 等人 (1976) 的著作介绍了本章中讨论的所有语义描述方法。 Pagan (1981)是本章大部分内容的另一个很好的参考资料。本章中指称语义函数的形式与Meyer (1990)中的形式类似。
Good introductions to denotational semantics can be found in Gordon (1979) and Stoy (1977). Introductions to all of the semantics description methods discussed in this chapter can be found in Marcotty et al. (1976). Another good reference for much of the chapter material is Pagan (1981). The form of the denotational semantic functions in this chapter is similar to that found in Meyer (1990).
定义语法和语义。
Define syntax and semantics.
语言描述适合哪些人?
Who are language descriptions for?
描述通用语言生成器的操作。
Describe the operation of a general language generator.
描述通用语言识别器的操作。
Describe the operation of a general language recognizer.
句子和句子形式有什么区别?
What is the difference between a sentence and a sentential form?
定义左递归文法规则。
Define a left-recursive grammar rule.
大多数 EBNF 都有哪三种常见的扩展?
What three extensions are common to most EBNFs?
区分静态和动态语义。
Distinguish between static and dynamic semantics.
谓词在属性语法中起什么作用?
What purpose do predicates serve in an attribute grammar?
合成属性和继承属性之间有什么区别?
What is the difference between a synthesized and an inherited attribute?
对于给定的属性语法树,如何确定属性评估的顺序?
How is the order of evaluation of attributes determined for the trees of a given attribute grammar?
属性语法的主要用途是什么?
What is the primary use of attribute grammars?
解释方法论和符号在描述编程语言语义方面的主要用途。
Explain the primary uses of a methodology and notation for describing the semantics of programming languages.
为什么不能用机器语言来定义操作语义中的语句?
Why can machine languages not be used to define statements in operational semantics?
描述操作语义的两个层次的用法。
Describe the two levels of uses of operational semantics.
在指称语义学中,什么是句法域和语义域?
In denotational semantics, what are the syntactic and semantic domains?
在程序状态中存储了什么以实现指称语义?
What is stored in the state of a program for denotational semantics?
哪种语义方法最广为人知?
Which semantics approach is most widely known?
为了构建语言的名称描述,必须为每个语言实体定义哪两件事?
What two things must be defined for each language entity in order to construct a denotational description of the language?
推理规则的哪部分是前提?
Which part of an inference rule is the antecedent?
什么是谓词转换函数?
What is a predicate transformer function?
部分正确性对于循环构造意味着什么?
What does partial correctness mean for a loop construct?
公理语义学基于数学的哪个分支?
On what branch of mathematics is axiomatic semantics based?
指称语义学基于数学的哪个分支?
On what branch of mathematics is denotational semantics based?
使用软件纯解释器进行操作语义解释有什么问题?
What is the problem with using a software pure interpreter for operational semantics?
解释给定语句的先决条件和后置条件在公理语义中的含义。
Explain what the preconditions and postconditions of a given statement mean in axiomatic semantics.
描述使用公理语义来证明给定程序正确性的方法。
Describe the approach of using axiomatic semantics to prove the correctness of a given program.
描述指称语义的基本概念。
Describe the basic concept of denotational semantics.
操作语义学和指称语义学有哪些根本区别?
In what fundamental way do operational semantics and denotational semantics differ?
语言描述的两个数学模型是生成和识别。描述每个模型如何定义编程语言的语法。
The two mathematical models of language description are generation and recognition. Describe how each can define the syntax of a programming language.
为以下内容编写 EBNF 描述:
Java 类定义头语句
Java方法调用语句
ACswitch声明
ACunion定义
Cfloat文字
Write EBNF descriptions for the following:
A Java class definition header statement
A Java method call statement
A C switch statement
A C union definition
C float literals
重写示例 3.4的 BNF ,使其+优先于*并强制+右结合。
Rewrite the BNF of Example 3.4 to give + precedence over * and force + to be right associative.
重写示例 3.4的 BNF ,添加Java 的++和--一元运算符。
Rewrite the BNF of Example 3.4 to add the ++ and -- unary operators of Java.
写出Java布尔表达式的BNF描述,包括三个运算符&&、、||和!以及关系表达式。
Write a BNF description of the Boolean expressions of Java, including the three operators &&, ||, and ! and the relational expressions.
使用示例 3.2中的语法,显示下列每个语句的解析树和最左推导:
A = A * (B + (C * A))
B = C * (A * C + B)
A = A * (B + (C))
Using the grammar in Example 3.2, show a parse tree and a leftmost derivation for each of the following statements:
A = A * (B + (C * A))
B = C * (A * C + B)
A = A * (B + (C))
使用示例 3.4中的语法,显示下列每个语句的解析树和最左推导:
A = (A + B) * C
A = B + C + A
A = A * (B + C)
A = B * (C * (A + B))
Using the grammar in Example 3.4, show a parse tree and a leftmost derivation for each of the following statements:
A = (A + B) * C
A = B + C + A
A = A * (B + C)
A = B * (C * (A + B))
证明以下文法是有歧义的:
<S> → <A>
<A> → <A> +<A> |<id>
<id> → a |b |c
Prove that the following grammar is ambiguous:
<S> → <A>
<A> → <A> + <A> | <id>
<id> → a | b | c
修改示例 3.4的语法,添加一个优先级比+或 * 更高的一元减运算符。
Modify the grammar of Example 3.4 to add a unary minus operator that has higher precedence than either + or *.
用英语描述以下语法定义的语言:
<S> → <A><B><C>
<A> → 一个 <A>|一个
<B> → b <B> |b
<C> → c <C> |c
Describe, in English, the language defined by the following grammar:
<S> → <A> <B> <C>
<A> → a <A> | a
<B> → b <B> | b
<C> → c <C> | c
考虑以下语法:
<S> → <A> 一个 <B> 一个
<A> → <A> b |b
<B> → 一个 <B>|一个
下列哪些句子属于该文法生成的语言?
巴布
宝贝
宝贝S
巴巴布
Consider the following grammar:
<S> → <A> a <B> b
<A> → <A> b | b
<B> → a <B> | a
Which of the following sentences are in the language generated by this grammar?
baab
bbbab
bbaaaaaS
bbaab
考虑以下语法:
<S> → a <S> c <B> |<A> |b
<A> → c <A> |c
<B> → d |<A>
下列哪些句子属于该文法生成的语言?
ABCD 的
艾克比
韓國
酸碱
加拿大
Consider the following grammar:
<S> → a <S> c <B> | <A> | b
<A> → c <A> | c
<B> → d | <A>
Which of the following sentences are in the language generated by this grammar?
abcd
acccbd
acccbcc
acd
accc
为该语言编写一个语法,该语言由具有n 个字母 a 副本和相同数量的字母 b 副本的字符串组成,其中n > ≤ 0。例如,字符串 ab、aaaabbbb 和 aaaaaaaabbbbbbbb 在该语言中,但 a、abb、ba 和 aaabb 不在。
Write a grammar for the language consisting of strings that have n copies of the letter a followed by the same number of copies of the letter b, where n > 0. For example, the strings ab, aaaabbbb, and aaaaaaaabbbbbbbb are in the language but a, abb, ba, and aaabb are not.
绘制句子 aabb 和 aaaabbbb 的解析树,如从问题 13的语法中得出的。
Draw parse trees for the sentences aabb and aaaabbbb, as derived from the grammar of Problem 13.
将示例 3.1的 BNF 转换为 EBNF。
Convert the BNF of Example 3.1 to EBNF.
将示例 3.3的 BNF 转换为 EBNF。
Convert the BNF of Example 3.3 to EBNF.
将以下 EBNF 转换为 BNF:
Convert the following EBNF to BNF:
内在属性和非内在合成属性之间有什么区别?
What is the difference between an intrinsic attribute and a nonintrinsic synthesized attribute?
Write an attribute grammar whose BNF basis is that of Example 3.6 in Section 3.4.5 but whose language rules are as follows: Data types cannot be mixed in expressions, but assignment statements need not have the same types on both sides of the assignment operator.
编写一个属性语法,其基本 BNF 与示例 3.2相同,且其类型规则与节 3.4.5中的赋值语句示例相同。
Write an attribute grammar whose base BNF is that of Example 3.2 and whose type rules are the same as for the assignment statement example of Section 3.4.5.
Javado-while
艾达for
C++if-then-else
碳for
碳switch
Using the virtual machine instructions given in Section 3.5.1.1, give an operational semantic definition of the following:
Java do-while
Ada for
C++ if-then-else
C for
C switch
为以下语句编写一个指称语义映射函数:
艾达for
Javado-while
Java 布尔表达式
Javafor
碳switch
Write a denotational semantics mapping function for the following statements:
Ada for
Java do-while
Java Boolean expressions
Java for
C switch
计算下列每个赋值语句和后置条件的最弱先决条件:
a = 2 * (b - 1) - 1 {a > 0}
b = (c + 10) / 3 {b > 6}
a = a + 2 * b - 1 {a > 1}
x = 2 * y + x - 1 {x > 11}
Compute the weakest precondition for each of the following assignment statements and postconditions:
a = 2 * (b - 1) - 1 {a > 0}
b = (c + 10) / 3 {b > 6}
a = a + 2 * b - 1 {a > 1}
x = 2 * y + x - 1 {x > 11}
计算下列每个赋值语句序列及其后置条件的最弱先决条件:
a = 2 * b + 1;
b = a - 3
{b < 0}
a = 3 * (2 * b + a);
b = 2 * a - 1
{b > 5}
Compute the weakest precondition for each of the following sequences of assignment statements and their postconditions:
a = 2 * b + 1;
b = a - 3
{b < 0}
a = 3 * (2 * b + a);
b = 2 * a - 1
{b > 5}
计算下列每个选择构造及其后置条件的最弱先决条件:
if (a == b)
b = 2 * a + 1
else
b = 2 * a;
{b > 1}
if (x < y)
x = x + 1
else
x = 3 * x
{x < 0}
if (x > y)
y = 2 * x + 1
else
y = 3 * x - 1;
{y > 3}
Compute the weakest precondition for each of the following selection constructs and their postconditions:
if (a == b)
b = 2 * a + 1
else
b = 2 * a;
{b > 1}
if (x < y)
x = x + 1
else
x = 3 * x
{x < 0}
if (x > y)
y = 2 * x + 1
else
y = 3 * x - 1;
{y > 3}
while解释证明B doS形式逻辑预测试循环结构正确性的四个标准end。
Explain the four criteria for proving the correctness of a logical pretest loop construct of the form while B do S end.
证明
Prove that
证明下面的程序是正确的:
Prove the following program is correct:
{n > 0}
count = n;
sum = 0;
while count <> 0 do
sum = sum + count;
count = count - 1;
end
{sum = 1 + 2 + . . . + n}
{n > 0}
count = n;
sum = 0;
while count <> 0 do
sum = sum + count;
count = count - 1;
end
{sum = 1 + 2 + . . . + n}本章首先介绍词汇分析,并给出一个简单的示例。接下来,讨论一般的解析问题,包括两种主要的解析方法以及解析的复杂性。然后,我们介绍自上而下解析器的递归下降实现技术,包括递归下降解析器各部分的示例以及使用该解析器的解析跟踪。最后一节讨论自下而上的解析和 LR 解析算法。本节包括一个小型 LR 解析表的示例以及使用 LR 解析过程的字符串解析。
This chapter begins with an introduction to lexical analysis, along with a simple example. Next, the general parsing problem is discussed, including the two primary approaches to parsing, and the complexity of parsing. Then, we introduce the recursive-descent implementation technique for top-down parsers, including examples of parts of a recursive-descent parser and a trace of a parse using one. The last section discusses bottom-up parsing and the LR parsing algorithm. This section includes an example of a small LR parsing table and the parse of a string using the LR parsing process.
认真研究编译器设计需要至少一个学期的深入学习,包括设计和实现一种小型但实用的编程语言的编译器。这类课程的第一部分专门介绍词法和语法分析。语法分析器是编译器的核心,因为其他几个重要组件(包括语义分析器和中间代码生成器)都是由语法分析器的操作驱动的。
A serious investigation of compiler design requires at least a semester of intensive study, including the design and implementation of a compiler for a small but realistic programming language. The first part of such a course is devoted to lexical and syntax analyses. The syntax analyzer is the heart of a compiler, because several other important components, including the semantic analyzer and the intermediate code generator, are driven by the actions of the syntax analyzer.
有些读者可能想知道,为什么在一本关于编程语言的书中会包含一章关于编译器的任何部分的内容。至少有两个原因需要在本文中讨论词法和语法分析:首先,语法分析器直接基于第3章 中讨论的语法,因此将它们作为语法的应用进行讨论是很自然的。其次,在编译器设计之外的许多情况下都需要词法和语法分析器。许多应用程序,其中包括程序列表格式化程序、计算程序复杂性的程序以及必须分析和响应配置文件内容的程序,都需要进行词法和语法分析。因此,即使软件开发人员永远不需要编写编译器,词法和语法分析也是重要的主题。此外,一些计算机科学课程不再要求学生参加编译器设计课程,这使得学生没有词法或语法分析方面的指导。在这些情况下,本章可以在编程语言课程中介绍。在需要编译器设计课程的学位课程中,可以跳过本章。
Some readers may wonder why a chapter on any part of a compiler would be included in a book on programming languages. There are at least two reasons to include a discussion of lexical and syntax analyses in this text: First, syntax analyzers are based directly on the grammars discussed in Chapter 3, so it is natural to discuss them as an application of grammars. Second, lexical and syntax analyzers are needed in numerous situations outside compiler design. Many applications, among them program listing formatters, programs that compute the complexity of programs, and programs that must analyze and react to the contents of a configuration file, all need to do lexical and syntax analyses. Therefore, lexical and syntax analyses are important topics for software developers, even if they never need to write a compiler. Furthermore, some computer science programs no longer require students to take a compiler design course, which leaves students with no instruction in lexical or syntax analysis. In those cases, this chapter can be covered in the programming language course. In degree programs that require a compiler design course, this chapter can be skipped.
第1章 介绍了实现编程语言的三种不同方法:编译、纯解释和混合实现。编译方法使用称为编译器的程序,它将用高级编程语言编写的程序转换为机器代码。编译通常用于实现用于大型应用程序的编程语言,这些应用程序通常用 C++ 和 COBOL 等语言编写。纯解释系统不执行任何翻译;相反,程序由软件解释器以其原始形式进行解释。纯解释通常用于执行效率并不重要的小型系统,例如嵌入 HTML 文档的脚本,这些脚本用 JavaScript 等语言编写。混合实现系统将用高级语言编写的程序转换为中间形式,然后进行解释。这些系统现在比以往任何时候都得到更广泛的应用,这在很大程度上要归功于脚本语言的流行。传统上,混合系统导致程序执行速度比编译器系统慢得多。然而,近年来,即时 (JIT) 编译器的使用已变得非常普遍,尤其是对于 Java 程序和为 Microsoft .NET 系统编写的程序。 JIT 编译器将中间代码转换为机器代码,在方法首次调用时使用。实际上,JIT 编译器将混合系统转换为延迟编译器系统。
Three different approaches to implementing programming languages are introduced in Chapter 1: compilation, pure interpretation, and hybrid implementation. The compilation approach uses a program called a compiler, which translates programs written in a high-level programming language into machine code. Compilation is typically used to implement programming languages that are used for large applications, often written in languages such as C++ and COBOL. Pure interpretation systems perform no translation; rather, programs are interpreted in their original form by a software interpreter. Pure interpretation is usually used for smaller systems in which execution efficiency is not critical, such as scripts embedded in HTML documents, written in languages such as JavaScript. Hybrid implementation systems translate programs written in high-level languages into intermediate forms, which are interpreted. These systems are now more widely used than ever, thanks in large part to the popularity of scripting languages. Traditionally, hybrid systems have resulted in much slower program execution than compiler systems. However, in recent years the use of Just-in-Time (JIT) compilers has become widespread, particularly for Java programs and programs written for the Microsoft .NET system. A JIT compiler, which translates intermediate code to machine code, is used on methods at the time they are first called. In effect, a JIT compiler transforms a hybrid system to a delayed compiler system.
刚才讨论的三种实现方法都使用了词法分析器和语法分析器。
All three of the implementation approaches just discussed use both lexical and syntax analyzers.
语法分析器或解析器几乎总是基于程序语法的形式化描述。最常用的语法描述形式是上下文无关文法 (BNF),第3章 将介绍它。与使用一些非正式语法描述相比,使用 BNF 至少有三个引人注目的优势。首先,程序语法的 BNF 描述清晰简洁,无论是对于人类还是对于使用它们的软件系统而言都是如此。其次,BNF 描述可以用作语法分析器的直接基础。第三,基于 BNF 的实现由于其模块化而相对容易维护。
Syntax analyzers, or parsers, are nearly always based on a formal description of the syntax of programs. The most commonly used syntax-description formalism is context-free grammars, or BNF, which is introduced in Chapter 3. Using BNF, as opposed to using some informal syntax description, has at least three compelling advantages. First, BNF descriptions of the syntax of programs are clear and concise, both for humans and for software systems that use them. Second, the BNF description can be used as the direct basis for the syntax analyzer. Third, implementations based on BNF are relatively easy to maintain because of their modularity.
几乎所有编译器都将语法分析任务分为两个不同的部分:词法分析和语法分析,尽管这个术语容易让人混淆。词法分析器处理小规模的语言结构,例如名称和数字文字。语法分析器处理大规模的结构,例如表达式、语句和程序单元。第4.2节 介绍了词法分析器。第 4.3、4.4和4.5节讨论了语法分析器。
Nearly all compilers separate the task of analyzing syntax into two distinct parts, lexical analysis and syntax analysis, although this terminology is confusing. The lexical analyzer deals with small-scale language constructs, such as names and numeric literals. The syntax analyzer deals with the large-scale constructs, such as expressions, statements, and program units. Section 4.2 introduces lexical analyzers. Sections 4.3, 4.4, and 4.5 discuss syntax analyzers.
词法分析与语法分析相分离的原因有三:
There are three reasons why lexical analysis is separated from syntax analysis:
简单性——词法分析的技术比语法分析所需的技术简单,因此如果将词法分析过程分开,则可以更简单。此外,从语法分析器中删除词法分析的低级细节可以使语法分析器更小、更简单。
Simplicity—Techniques for lexical analysis are less complex than those required for syntax analysis, so the lexical-analysis process can be simpler if it is separate. Also, removing the low-level details of lexical analysis from the syntax analyzer makes the syntax analyzer both smaller and less complex.
效率——尽管优化词法分析器是值得的,因为词法分析占据了总编译时间的很大一部分,但优化语法分析器却没有什么用。分离有助于这种选择性优化。
Efficiency—Although it pays to optimize the lexical analyzer, because lexical analysis requires a significant portion of total compilation time, it is not fruitful to optimize the syntax analyzer. Separation facilitates this selective optimization.
可移植性——由于词法分析器读取输入程序文件,并且通常包括输入的缓冲,因此它在某种程度上依赖于平台。但是,语法分析器可以独立于平台。将任何软件系统中与机器相关的部分隔离开来总是好的。
Portability—Because the lexical analyzer reads input program files and often includes buffering of that input, it is somewhat platform dependent. However, the syntax analyzer can be platform independent. It is always good to isolate machine-dependent parts of any software system.
词法分析器本质上是一个模式匹配器。模式匹配器试图从给定的字符串中找出与给定的字符模式匹配的子字符串。模式匹配是计算的传统部分。模式匹配最早的用途之一是文本编辑器,例如edUNIX 早期版本中引入的行编辑器。从那时起,模式匹配就进入了一些编程语言,例如 Perl 和 JavaScript。它也可以通过 Java、C++ 和 C# 的标准类库获得。
A lexical analyzer is essentially a pattern matcher. A pattern matcher attempts to find a substring of a given string of characters that matches a given character pattern. Pattern matching is a traditional part of computing. One of the earliest uses of pattern matching was with text editors, such as the ed line editor, which was introduced in an early version of UNIX. Since then, pattern matching has found its way into some programming languages—for example, Perl and JavaScript. It is also available through the standard class libraries of Java, C++, and C#.
词法分析器是语法分析器的前端。从技术上讲,词法分析是语法分析的一部分。词法分析器在程序结构的最低级别执行语法分析。输入程序对编译器来说只是一串字符。词法分析器将字符收集到逻辑组中,并根据其结构为这些组分配内部代码。在第3章 中,这些逻辑组称为词素,这些组类别的内部代码称为标记。词素是通过将输入字符串与字符串模式进行匹配来识别的。尽管标记通常表示为整数值,但为了词法和语法分析器的可读性,通常通过命名常量来引用它们。
A lexical analyzer serves as the front end of a syntax analyzer. Technically, lexical analysis is a part of syntax analysis. A lexical analyzer performs syntax analysis at the lowest level of program structure. An input program appears to a compiler as a single string of characters. The lexical analyzer collects characters into logical groupings and assigns internal codes to the groupings according to their structure. In Chapter 3, these logical groupings are named lexemes, and the internal codes for categories of these groupings are named tokens. Lexemes are recognized by matching the input character string against character string patterns. Although tokens are usually represented as integer values, for the sake of readability of lexical and syntax analyzers, they are often referenced through named constants.
考虑以下赋值语句的示例:
Consider the following example of an assignment statement:
result = oldsum - value / 100;result = oldsum - value / 100;
以下是此语句的标记和词素:
Following are the tokens and lexemes of this statement:
词法分析器从给定的输入字符串中提取词素并生成相应的标记。在编译器的早期,词法分析器通常处理整个源程序文件并生成一个标记和词素文件。然而,现在大多数词法分析器都是子程序,它们定位输入中的下一个词素,确定其关联的标记代码,并将它们返回给调用者,即语法分析器。因此,每次调用词法分析器都会返回一个词素及其标记。语法分析器看到的输入程序的唯一视图是词法分析器的输出,一次一个标记。
Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens. In the early days of compilers, lexical analyzers often processed an entire source program file and produced a file of tokens and lexemes. Now, however, most lexical analyzers are subprograms that locate the next lexeme in the input, determine its associated token code, and return them to the caller, which is the syntax analyzer. So, each call to the lexical analyzer returns a single lexeme and its token. The only view of the input program seen by the syntax analyzer is the output of the lexical analyzer, one token at a time.
词法分析过程包括跳过词素之外的注释和空格,因为它们与程序的含义无关。此外,词法分析器将用户定义名称的词素插入符号表中,该表供编译器的后续阶段使用。最后,词法分析器检测标记中的语法错误(例如格式错误的浮点文字),并将此类错误报告给用户。
The lexical-analysis process includes skipping comments and white space outside lexemes, as they are not relevant to the meaning of the program. Also, the lexical analyzer inserts lexemes for user-defined names into the symbol table, which is used by later phases of the compiler. Finally, lexical analyzers detect syntactic errors in tokens, such as ill-formed floating-point literals, and report such errors to the user.
构建词法分析器有三种方法:
There are three approaches for building a lexical analyzer:
使用与正则表达式相关的描述性语言编写语言标记模式的正式描述。1这些描述用作自动生成词法分析器的软件工具的输入。有许多此类工具可用于此目的。其中最古老的工具名为 lex,通常包含在 UNIX 系统中。
Write a formal description of the token patterns of the language using a descriptive language related to regular expressions.1 These descriptions are used as input to a software tool that automatically generates a lexical analyzer. There are many such tools available for this. The oldest of these, named lex, is commonly included as part of UNIX systems.
设计一个描述该语言的标记模式的状态转换图,并编写一个实现该图的程序。
Design a state transition diagram that describes the token patterns of the language and write a program that implements the diagram.
设计一个描述语言标记模式的状态转换图,并手动构建状态图的表驱动实现。
Design a state transition diagram that describes the token patterns of the language and hand-construct a table-driven implementation of the state diagram.
状态转换图(或简称为状态图)是一个有向图。状态图的节点标有状态名称。弧标有导致状态间转换的输入字符。弧还可能包括发生转换时词法分析器必须执行的操作。
A state transition diagram, or just state diagram, is a directed graph. The nodes of a state diagram are labeled with state names. The arcs are labeled with the input characters that cause the transitions among the states. An arc may also include actions the lexical analyzer must perform when the transition is taken.
词法分析器所用状态图的形式是一类称为有限自动机的数学机器的表示。有限自动机可用于识别一类称为正则语言的语言成员。正则语法是正则语言的生成工具。编程语言的标记是正则语言,而词法分析器是有限自动机。
State diagrams of the form used for lexical analyzers are representations of a class of mathematical machines called finite automata. Finite automata can be designed to recognize members of a class of languages called regular languages. Regular grammars are generative devices for regular languages. The tokens of a programming language are a regular language, and a lexical analyzer is a finite automaton.
现在,我们用状态图和实现它的代码来说明词法分析器的构造。状态图可以简单地包含每个标记模式的状态和转换。但是,这种方法会导致非常大且复杂的状态图,因为状态图中的每个节点都需要针对所分析语言的字符集中的每个字符进行转换。因此,我们考虑简化它的方法。
We now illustrate lexical-analyzer construction with a state diagram and the code that implements it. The state diagram could simply include states and transitions for each and every token pattern. However, that approach results in a very large and complex diagram, because every node in the state diagram would need a transition for every character in the character set of the language being analyzed. We therefore consider ways to simplify it.
假设我们需要一个词法分析器,它只识别算术表达式,包括变量名和整数字量作为操作数。假设变量名由大写字母、小写字母和数字组成的字符串,但必须以字母开头。名称没有长度限制。首先要注意的是,有 52 个不同的字符(任何大写或小写字母)可以开头一个名称,这需要从转换图的初始状态进行 52 次转换。但是,词法分析器只关心确定它是一个名称,而不关心它是哪个特定名称。因此,我们为所有 52 个字母定义一个名为 LETTER 的字符类,并对任何名称的第一个字母使用单个转换。
Suppose we need a lexical analyzer that recognizes only arithmetic expressions, including variable names and integer literals as operands. Assume that the variable names consist of strings of uppercase letters, lowercase letters, and digits but must begin with a letter. Names have no length limitation. The first thing to observe is that there are 52 different characters (any uppercase or lowercase letter) that can begin a name, which would require 52 transitions from the transition diagram’s initial state. However, a lexical analyzer is interested only in determining that it is a name and is not concerned with which specific name it happens to be. Therefore, we define a character class named LETTER for all 52 letters and use a single transition on the first letter of any name.
简化转换图的另一个机会是使用整数文字标记。有 10 个不同的字符可以开始整数文字词素。这将需要从状态图的起始状态进行 10 次转换。因为特定数字不是词汇表的关注点分析器,如果我们为数字定义一个名为 DIGIT 的字符类,并使用这个字符类中的任何字符到收集整数文字的状态的单一转换,我们就可以构建一个更加紧凑的状态图。
Another opportunity for simplifying the transition diagram is with the integer literal tokens. There are 10 different characters that could begin an integer literal lexeme. This would require 10 transitions from the start state of the state diagram. Because specific digits are not a concern of the lexical analyzer, we can build a much more compact state diagram if we define a character class named DIGIT for digits and use a single transition on any character in this character class to a state that collects integer literals.
因为我们的名字可以包含数字,所以从名字第一个字符后面的节点的转换可以使用 LETTER 或 DIGIT 上的单个转换来继续收集名字的字符。
Because our names can include digits, the transition from the node following the first character of a name can use a single transition on LETTER or DIGIT to continue collecting the characters of a name.
接下来,我们为词法分析器中的常见任务定义一些实用子程序。首先,我们需要一个子程序,我们可以将其命名为getChar,它具有多项功能。调用时,getChar从输入程序获取输入的下一个字符并将其放入全局变量 中nextChar。getChar还必须确定输入字符的字符类并将其放入全局变量 中charClass。词法分析器正在构建的词素(可以实现为字符串或数组)将被命名为lexeme。
Next, we define some utility subprograms for the common tasks inside the lexical analyzer. First, we need a subprogram, which we can name getChar, that has several duties. When called, getChar gets the next character of input from the input program and puts it in the global variable nextChar. getChar also must determine the character class of the input character and put it in the global variable charClass. The lexeme being built by the lexical analyzer, which could be implemented as a character string or an array, will be named lexeme.
我们在名为 的子程序中实现了将 中的字符放入nextChar字符串数组的过程。必须显式调用此子程序,因为程序中包含一些不需要放入 中的字符,例如词素之间的空格字符。在更现实的词法分析器中,注释也不会放在 中。lexemeaddCharlexemelexeme
We implement the process of putting the character in nextChar into the string array lexeme in a subprogram named addChar. This subprogram must be explicitly called because programs include some characters that need not be put in lexeme, for example the white-space characters between lexemes. In a more realistic lexical analyzer, comments also would not be placed in lexeme.
当调用词法分析器时,如果输入的下一个字符是下一个词素的第一个字符,则很方便。因此,getNonBlank每次调用分析器时,都会使用名为的函数来跳过空格。
When the lexical analyzer is called, it is convenient if the next character of input is the first character of the next lexeme. Because of this, a function named getNonBlank is used to skip white space every time the analyzer is called.
最后,需要一个名为的子程序lookup来计算单字符标记的标记代码。在我们的示例中,这些是括号和算术运算符。标记代码是编译器编写者任意分配给标记的数字。
Finally, a subprogram named lookup is needed to compute the token code for the single-character tokens. In our example, these are parentheses and the arithmetic operators. Token codes are numbers arbitrarily assigned to tokens by the compiler writer.
图 4.1中的状态图描述了我们的令牌的模式。它包括状态图每次转换所需的操作。
The state diagram in Figure 4.1 describes the patterns for our tokens. It includes the actions required on each transition of the state diagram.
下面是图 4.1状态图中指定的词法分析器的 C 实现,其中包括用于测试目的的主驱动函数:
The following is a C implementation of a lexical analyzer specified in the state diagram of Figure 4.1, including a main driver function for testing purposes:
/* front.c - a lexical analyzer system for simple
arithmetic expressions */
#include <stdio.h>
#include <ctype.h>
/* Global declarations */
/* Variables */
int charClass;
char lexeme [100];
char nextChar;
int lexLen;
int token;
int nextToken;
FILE *in_fp, *fopen();
/* Function declarations */
void addChar();
void getChar();
void getNonBlank();
int lex();
/* Character classes */
#define LETTER 0
#define DIGIT 1
#define UNKNOWN 99
/* Token codes */
#define INT_LIT 10
#define IDENT 11
#define ASSIGN_OP 20
#define ADD_OP 21
#define SUB_OP 22
#define MULT_OP 23
#define DIV_OP 24
#define LEFT_PAREN 25
#define RIGHT_PAREN 26
/******************************************************/
/* main driver */
main() {
/* Open the input data file and process its contents */
if ((in_fp = fopen("front.in", "r")) == NULL)
printf("ERROR - cannot open front.in \n");
else {
getChar();
do {
lex();
} while (nextToken! = EOF);
}
}
/*****************************************************/
/* lookup - a function to lookup operators and parentheses
and return the token */
int lookup(char ch) {
switch (ch) {
case '(':
addChar();
nextToken = LEFT_PAREN;
break;
case ')':
addChar();
nextToken = RIGHT_PAREN;
break;
case '+':
addChar();
nextToken = ADD_OP;
break;
case '-':
addChar();
nextToken = SUB_OP;
break;
case '*':
addChar();
nextToken = MULT_OP;
break;
case '/':
addChar();
nextToken = DIV_OP;
break;
default:
addChar();
nextToken = EOF;
break;
}
return nextToken;
}
/*****************************************************/
/* addChar - a function to add nextChar to lexeme */
void addChar() {
if (lexLen <= 98) {
lexeme[lexLen++] = nextChar;
lexeme[lexLen] = 0;
}
else
printf("Error - lexeme is too long \n");
}
/*****************************************************/
/* getChar - a function to get the next character of
input and determine its character class */
void getChar() {
if ((nextChar = getc(in_fp)) = EOF) {
if (isalpha(nextChar))
charClass = LETTER;
else if (isdigit(nextChar))
charClass = DIGIT;
else charClass = UNKNOWN;
}
else
charClass = EOF;
}
/*****************************************************/
/* getNonBlank - a function to call getChar until it
returns a non-whitespace character */
void getNonBlank() {
while (isspace(nextChar))
getChar();
}
/
*****************************************************/
/* lex - a simple lexical analyzer for arithmetic
expressions */
int lex() {
lexLen = 0;
getNonBlank();
switch (charClass) {
/* Parse identifiers */
case LETTER:
addChar();
getChar();
while (charClass == LETTER || charClass == DIGIT) {
addChar();
getChar();
}
nextToken = IDENT;
break;
/* Parse integer literals */
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
nextToken = INT_LIT;
break;
/* Parentheses and operators */
case UNKNOWN:
lookup(nextChar);
getChar();
break;
/* EOF */
case EOF:
nextToken = EOF;
lexeme[0] = 'E';
lexeme[1] = 'O';
lexeme[2] = 'F';
lexeme[3] = 0;
break;
} /* End of switch */
printf("Next token is: %d, Next lexeme is %s\n",
nextToken, lexeme);
return nextToken;
} /* End of function lex */
/* front.c - a lexical analyzer system for simple
arithmetic expressions */
#include <stdio.h>
#include <ctype.h>
/* Global declarations */
/* Variables */
int charClass;
char lexeme [100];
char nextChar;
int lexLen;
int token;
int nextToken;
FILE *in_fp, *fopen();
/* Function declarations */
void addChar();
void getChar();
void getNonBlank();
int lex();
/* Character classes */
#define LETTER 0
#define DIGIT 1
#define UNKNOWN 99
/* Token codes */
#define INT_LIT 10
#define IDENT 11
#define ASSIGN_OP 20
#define ADD_OP 21
#define SUB_OP 22
#define MULT_OP 23
#define DIV_OP 24
#define LEFT_PAREN 25
#define RIGHT_PAREN 26
/******************************************************/
/* main driver */
main() {
/* Open the input data file and process its contents */
if ((in_fp = fopen("front.in", "r")) == NULL)
printf("ERROR - cannot open front.in \n");
else {
getChar();
do {
lex();
} while (nextToken! = EOF);
}
}
/*****************************************************/
/* lookup - a function to lookup operators and parentheses
and return the token */
int lookup(char ch) {
switch (ch) {
case '(':
addChar();
nextToken = LEFT_PAREN;
break;
case ')':
addChar();
nextToken = RIGHT_PAREN;
break;
case '+':
addChar();
nextToken = ADD_OP;
break;
case '-':
addChar();
nextToken = SUB_OP;
break;
case '*':
addChar();
nextToken = MULT_OP;
break;
case '/':
addChar();
nextToken = DIV_OP;
break;
default:
addChar();
nextToken = EOF;
break;
}
return nextToken;
}
/*****************************************************/
/* addChar - a function to add nextChar to lexeme */
void addChar() {
if (lexLen <= 98) {
lexeme[lexLen++] = nextChar;
lexeme[lexLen] = 0;
}
else
printf("Error - lexeme is too long \n");
}
/*****************************************************/
/* getChar - a function to get the next character of
input and determine its character class */
void getChar() {
if ((nextChar = getc(in_fp)) = EOF) {
if (isalpha(nextChar))
charClass = LETTER;
else if (isdigit(nextChar))
charClass = DIGIT;
else charClass = UNKNOWN;
}
else
charClass = EOF;
}
/*****************************************************/
/* getNonBlank - a function to call getChar until it
returns a non-whitespace character */
void getNonBlank() {
while (isspace(nextChar))
getChar();
}
/
*****************************************************/
/* lex - a simple lexical analyzer for arithmetic
expressions */
int lex() {
lexLen = 0;
getNonBlank();
switch (charClass) {
/* Parse identifiers */
case LETTER:
addChar();
getChar();
while (charClass == LETTER || charClass == DIGIT) {
addChar();
getChar();
}
nextToken = IDENT;
break;
/* Parse integer literals */
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
nextToken = INT_LIT;
break;
/* Parentheses and operators */
case UNKNOWN:
lookup(nextChar);
getChar();
break;
/* EOF */
case EOF:
nextToken = EOF;
lexeme[0] = 'E';
lexeme[1] = 'O';
lexeme[2] = 'F';
lexeme[3] = 0;
break;
} /* End of switch */
printf("Next token is: %d, Next lexeme is %s\n",
nextToken, lexeme);
return nextToken;
} /* End of function lex */
这段代码说明了词法分析器相对简单。当然,我们忽略了输入缓冲,以及其他一些重要细节。此外,我们处理的是一种非常小而简单的输入语言。
This code illustrates the relative simplicity of lexical analyzers. Of course, we have left out input buffering, as well as some other important details. Furthermore, we have dealt with a very small and simple input language.
考虑以下表达式:
Consider the following expression:
(sum + 47) / total(sum + 47) / total
front.c以下是该表达式的词法分析器的输出:
Following is the output of the lexical analyzer of front.c when used on this expression:
Next token is: 25 Next lexeme is (
Next token is: 11 Next lexeme is sum
Next token is: 21 Next lexeme is +
Next token is: 10 Next lexeme is 47
Next token is: 26 Next lexeme is )
Next token is: 24 Next lexeme is /
Next token is: 11 Next lexeme is total
Next token is: -1 Next lexeme is EOF
Next token is: 25 Next lexeme is (
Next token is: 11 Next lexeme is sum
Next token is: 21 Next lexeme is +
Next token is: 10 Next lexeme is 47
Next token is: 26 Next lexeme is )
Next token is: 24 Next lexeme is /
Next token is: 11 Next lexeme is total
Next token is: -1 Next lexeme is EOF
程序中的名称和保留字具有相似的模式。虽然可以构建状态图来识别编程语言的每个特定保留字,但这会导致状态图过大。让词法分析器识别具有相同模式的名称和保留字,并在保留字表中查找以确定哪些名称是保留字,这样更简单、更快捷。使用此方法将保留字视为名称标记类别中的例外。
Names and reserved words in programs have similar patterns. Although it is possible to build a state diagram to recognize every specific reserved word of a programming language, that would result in a prohibitively large state diagram. It is much simpler and faster to have the lexical analyzer recognize names and reserved words with the same pattern and use a lookup in a table of reserved words to determine which names are reserved words. Using this approach considers reserved words to be exceptions in the names token category.
词法分析器通常负责符号表的初始构建,符号表充当编译器的名称数据库。符号表中的条目存储有关用户定义名称的信息以及名称的属性。例如,如果名称是变量的名称,则变量的类型是其将存储在符号表中的属性之一。名称通常由词法分析器放置在符号表中。名称的属性通常由编译器的某个部分放入符号表中,该部分是在词法分析器操作之后进行的。
A lexical analyzer often is responsible for the initial construction of the symbol table, which acts as a database of names for the compiler. The entries in the symbol table store information about user-defined names, as well as the attributes of the names. For example, if the name is that of a variable, the variable’s type is one of its attributes that will be stored in the symbol table. Names are usually placed in the symbol table by the lexical analyzer. The attributes of a name are usually put in the symbol table by some part of the compiler that is subsequent to the actions of the lexical analyzer.
分析语法的过程中,被称为语法分析的部分通常被称为解析。我们将交替使用这两个词。
The part of the process of analyzing syntax that is referred to as syntax analysis is often called parsing. We will use these two interchangeably.
本节讨论一般的解析问题,介绍自上而下和自下而上的两大类解析算法,以及解析过程的复杂性。
This section discusses the general parsing problem and introduces the two main categories of parsing algorithms, top-down and bottom-up, as well as the complexity of the parsing process.
编程语言的解析器会为给定的程序构建解析树。在某些情况下,解析树只是隐式构建的,这意味着可能只生成树的遍历。但在所有情况下,构建解析树所需的信息都是在解析过程中创建的。解析树和派生都包含语言处理器所需的所有句法信息。
Parsers for programming languages construct parse trees for given programs. In some cases, the parse tree is only implicitly constructed, meaning that perhaps only a traversal of the tree is generated. But in all cases, the information required to build the parse tree is created during the parse. Both parse trees and derivations include all of the syntactic information needed by a language processor.
语法分析有两个不同的目标:首先,语法分析器必须检查输入程序以确定其语法是否正确。当发现错误时,分析器必须生成诊断消息并恢复。在这种情况下,恢复意味着它必须恢复到正常状态并继续分析输入程序。此步骤是必需的,以便编译器在单次分析输入程序期间发现尽可能多的错误。如果做得不好,错误恢复可能会产生更多错误,或者至少是更多错误消息。语法分析的第二个目标是为语法正确的输入生成完整的解析树,或者至少跟踪完整解析树的结构。解析树(或其跟踪)用作翻译的基础。
There are two distinct goals of syntax analysis: First, the syntax analyzer must check the input program to determine whether it is syntactically correct. When an error is found, the analyzer must produce a diagnostic message and recover. In this case, recovery means it must get back to a normal state and continue its analysis of the input program. This step is required so that the compiler finds as many errors as possible during a single analysis of the input program. If it is not done well, error recovery may create more errors, or at least more error messages. The second goal of syntax analysis is to produce a complete parse tree, or at least trace the structure of the complete parse tree, for syntactically correct input. The parse tree (or its trace) is used as the basis for translation.
解析器根据其构建解析树的方向进行分类。解析器分为两大类:自上而下(解析树从根向下构建到叶子)和自下而上(解析树从叶子向上构建到根)。
Parsers are categorized according to the direction in which they build parse trees. The two broad classes of parsers are top-down, in which the tree is built from the root downward to the leaves, and bottom-up, in which the parse tree is built from the leaves upward to the root.
在本章中,我们使用了一小组语法符号和字符串的符号约定,以使讨论不那么混乱。对于形式语言,它们如下:
In this chapter, we use a small set of notational conventions for grammar symbols and strings to make the discussion less cluttered. For formal languages, they are as follows:
终结符号——字母表开头的小写字母(a、b、...)
Terminal symbols—lowercase letters at the beginning of the alphabet (a, b, . . .)
非终结符号——字母表开头的大写字母(A、B、...)
Nonterminal symbols—uppercase letters at the beginning of the alphabet (A, B, . . .)
终结符或非终结符——字母表末尾的大写字母(W、X、Y、Z)
Terminals or nonterminals—uppercase letters at the end of the alphabet (W, X, Y, Z)
终结符串——字母表末尾的小写字母(w、x、y、z)
Strings of terminals—lowercase letters at the end of the alphabet (w, x, y, z)
混合字符串(终结符和/或非终结符) - 小写希腊字母
Mixed strings (terminals and/or nonterminals)—lowercase Greek letters
对于编程语言来说,终结符是语言的小型句法结构,我们称之为词素。编程语言的非终结符通常是内涵名称或缩写,用尖括号括起来 — 例如 <while_statement>、<expr> 和 <function_def>。语言的句子(就编程语言而言是程序)是终结符串。混合字符串描述语法规则的右侧 (RHS),并用于解析算法。
For programming languages, terminal symbols are the small-scale syntactic constructs of the language, what we have referred to as lexemes. The nonterminal symbols of programming languages are usually connotative names or abbreviations, surrounded by angle brackets—for example, <while_statement>, <expr>, and <function_def>. The sentences of a language (programs, in the case of a programming language) are strings of terminals. Mixed strings describe right-hand sides (RHSs) of grammar rules and are used in parsing algorithms.
自上而下的解析器按前序跟踪或构建解析树。解析树的前序遍历从根开始。在跟踪每个节点的分支之前,先访问每个节点。从特定节点开始的分支按从左到右的顺序跟踪。这对应于最左派生。
A top-down parser traces or builds a parse tree in preorder. A preorder traversal of a parse tree begins with the root. Each node is visited before its branches are followed. Branches from a particular node are followed in left-to-right order. This corresponds to a leftmost derivation.
从推导的角度来说,自上而下的解析器可以描述如下:给定一个句子形式,它是最左推导的一部分,解析器的任务是找到该最左推导中的下一个句子形式。左句子形式的一般形式是 根据我们的符号约定,x 是一串终结符,A 是非终结符,并且 是一个混合字符串。由于 x 仅包含终端,A 是句子形式中最左边的非终端,因此它是必须扩展以获得最左推导中的下一个句子形式的非终端。确定下一个句子形式是选择以 A 为 LHS 的正确语法规则的问题。例如,如果当前句子形式是 A 规则是 和 自上而下的解析器必须在这三个规则中进行选择,以获得下一个句子形式,这可能是 或者 这是自上而下的解析器的解析决策问题。
In terms of the derivation, a top-down parser can be described as follows: Given a sentential form that is part of a leftmost derivation, the parser’s task is to find the next sentential form in that leftmost derivation. The general form of a left sentential form is whereby our notational conventions x is a string of terminal symbols, A is a nonterminal, and is a mixed string. Because x contains only terminals, A is the leftmost nonterminal in the sentential form, so it is the one that must be expanded to get the next sentential form in a leftmost derivation. Determining the next sentential form is a matter of choosing the correct grammar rule that has A as its LHS. For example, if the current sentential form is and the A-rules are and a top-down parser must choose among these three rules to get the next sentential form, which could be or This is the parsing decision problem for top-down parsers.
不同的自上而下的解析算法使用不同的信息来做出解析决策。最常见的自上而下的解析器通过将输入的下一个标记与这些规则的 RHS 可以生成的第一个符号进行比较,为当前句子形式中最左边的非终结符选择正确的 RHS。无论哪个 RHS 在它生成的字符串的左端有该标记,都是正确的。因此,在句子形式中 解析器将使用 A 生成的第一个标记来确定应使用哪个 A 规则来获取下一个句子形式。在上面的例子中,A 规则的三个 RHS 都以不同的终结符开头。解析器可以根据输入的下一个标记轻松选择正确的 RHS,在本例中,该标记必须是 a、b 或 c。通常,选择正确的 RHS 并不那么简单,因为当前句子形式中最左边的非终结符的一些 RHS 可能以非终结符开头。
Different top-down parsing algorithms use different information to make parsing decisions. The most common top-down parsers choose the correct RHS for the leftmost nonterminal in the current sentential form by comparing the next token of input with the first symbols that can be generated by the RHSs of those rules. Whichever RHS has that token at the left end of the string it generates is the correct one. So, in the sentential form the parser would use whatever token would be the first generated by A to determine which A-rule should be used to get the next sentential form. In the example above, the three RHSs of the A-rules all begin with different terminal symbols. The parser can easily choose the correct RHS based on the next token of input, which must be a, b, or c in this example. In general, choosing the correct RHS is not so straightforward, because some of the RHSs of the leftmost nonterminal in the current sentential form may begin with a nonterminal.
最常见的自上而下的解析算法是密切相关的。递归下降解析器是直接基于语言语法的 BNF 描述的语法分析器的编码版本。递归下降的最常见替代方法是使用解析表而不是代码来实现 BNF 规则。这两种算法被称为LL 算法,功能同样强大,这意味着它们适用于所有上下文无关语法的同一子集。LL 中的第一个 L 指定从左到右扫描输入;第二个 L 指定生成最左派生。第4.4节 介绍了实现 LL 解析器的递归下降方法。
The most common top-down parsing algorithms are closely related. A recursive-descent parser is a coded version of a syntax analyzer based directly on the BNF description of the syntax of language. The most common alternative to recursive descent is to use a parsing table, rather than code, to implement the BNF rules. Both of these, which are called LL algorithms, are equally powerful, meaning they work on the same subset of all context-free grammars. The first L in LL specifies a left-to-right scan of the input; the second L specifies that a leftmost derivation is generated. Section 4.4 introduces the recursive-descent approach to implementing an LL parser.
自下而上的解析器从叶子开始向根部前进,构建解析树。此解析顺序对应于最右派生的逆序。也就是说,派生的句子形式按从后到前的顺序生成。就派生而言,自下而上的解析器可以描述如下:给定一个正确的句子形式 解析器必须确定 是语法中规则的 RHS,必须将其简化为 LHS 才能产生先前的句子形式最右推导。例如,自下而上的解析器的第一步是确定初始给定句子的哪个子字符串是需要简化为其对应的 LHS 的 RHS,以获得推导中的倒数第二个句子形式。找到要简化的正确 RHS 的过程很复杂,因为给定的右句子形式可能包含来自正在解析的语言语法的多个 RHS。正确的 RHS 称为句柄。右句子形式是出现在最右推导中的句子形式。
A bottom-up parser constructs a parse tree by beginning at the leaves and progressing toward the root. This parse order corresponds to the reverse of a rightmost derivation. That is, the sentential forms of the derivation are produced in order of last to first. In terms of the derivation, a bottom-up parser can be described as follows: Given a right sentential form the parser must determine what substring of is the RHS of the rule in the grammar that must be reduced to its LHS to produce the previous sentential form in the rightmost derivation. For example, the first step for a bottom-up parser is to determine which substring of the initial given sentence is the RHS to be reduced to its corresponding LHS to get the second last sentential form in the derivation. The process of finding the correct RHS to reduce is complicated by the fact that a given right sentential form may include more than one RHS from the grammar of the language being parsed. The correct RHS is called the handle. A right sentential form is a sentential form that appears in a rightmost derivation.
考虑以下语法和推导:
Consider the following grammar and derivation:
萨阿卡阿卡阿卡=>=>=>
S => aAc => aaAc => aabc
此句子 aabc 的自下而上的解析器从句子开始,必须在其中找到句柄。在此示例中,这是一项简单的任务,因为字符串仅包含一个 RHS,即 b。当解析器用其 LHS A 替换 b 时,它会得到派生中的倒数第二个句子形式 aaAc。在一般情况下,如前所述,找到句柄要困难得多,因为句子形式可能包含几个不同的 RHS。
A bottom-up parser of this sentence, aabc, starts with the sentence and must find the handle in it. In this example, this is an easy task, for the string contains only one RHS, b. When the parser replaces b with its LHS, A, it gets the second to last sentential form in the derivation, aaAc. In the general case, as stated previously, finding the handle is much more difficult, because a sentential form may include several different RHSs.
自下而上的解析器通过检查可能句柄一侧或两侧的符号来查找给定右句型的句柄。可能句柄右侧的符号通常是输入中尚未分析的标记。
A bottom-up parser finds the handle of a given right sentential form by examining the symbols on one or both sides of a possible handle. Symbols to the right of the possible handle are usually tokens in the input that have not yet been analyzed.
最常见的自下而上的解析算法属于 LR 系列,其中 L 指定从左到右扫描输入,R 指定生成最右边的推导。
The most common bottom-up parsing algorithms are in the LR family, where the L specifies a left-to-right scan of the input and the R specifies that a rightmost derivation is generated.
适用于任何无歧义语法的解析算法都是复杂且低效的。事实上,这种算法的复杂性 这意味着它们所花费的时间量与要解析的字符串长度的立方成正比。之所以需要这么长的时间,是因为这些算法经常必须备份和重新解析正在分析的句子的一部分。当解析器在解析过程中犯了错误时,就需要重新解析。备份解析器还需要拆除并重建正在构建的解析树的一部分(或其踪迹)。 算法通常对实际过程(例如编译器的语法分析)没有用,因为它们太慢了。在这种情况下,计算机科学家经常寻找速度更快但通用性较差的算法。通用性换来了效率。在解析方面,已经发现了更快的算法,它们只适用于所有可能语法集合的一个子集。只要子集包含描述编程语言的语法,这些算法就是可以接受的。(实际上,正如第3章 所讨论的,整个班级上下文无关文法不足以描述大多数编程语言的所有语法。
Parsing algorithms that work for any unambiguous grammar are complicated and inefficient. In fact, the complexity of such algorithms is which means the amount of time they take is on the order of the cube of the length of the string to be parsed. This relatively large amount of time is required because these algorithms frequently must back up and reparse part of the sentence being analyzed. Reparsing is required when the parser has made a mistake in the parsing process. Backing up the parser also requires that part of the parse tree being constructed (or its trace) must be dismantled and rebuilt. algorithms are normally not useful for practical processes, such as syntax analysis for a compiler, because they are far too slow. In situations such as this, computer scientists often search for algorithms that are faster, though less general. Generality is traded for efficiency. In terms of parsing, faster algorithms have been found that work for only a subset of the set of all possible grammars. These algorithms are acceptable as long as the subset includes grammars that describe programming languages. (Actually, as discussed in Chapter 3, the whole class of context-free grammars is not adequate to describe all of the syntax of most programming languages.)
商业编译器的语法分析器使用的所有算法的复杂度都是 O(n),这意味着它们所花费的时间与要解析的字符串的长度成线性关系。这比 算法。
All algorithms used for the syntax analyzers of commercial compilers have complexity O(n), which means the time they take is linearly related to the length of the string to be parsed. This is vastly more efficient than algorithms.
本节介绍递归下降自上而下的解析器实现过程。
This section introduces the recursive-descent top-down parser implementation process.
递归下降解析器之所以如此命名,是因为它由一组子程序组成,其中许多子程序是递归的,并且它按自上而下的顺序生成解析树。这种递归反映了编程语言的性质,其中包括几种不同类型的嵌套结构。例如,语句通常嵌套在其他语句中。此外,表达式中的括号必须正确嵌套。这些结构的语法自然地用递归语法规则来描述。
A recursive-descent parser is so named because it consists of a collection of subprograms, many of which are recursive, and it produces a parse tree in top-down order. This recursion is a reflection of the nature of programming languages, which include several different kinds of nested structures. For example, statements are often nested in other statements. Also, parentheses in expressions must be properly nested. The syntax of these structures is naturally described with recursive grammar rules.
EBNF 非常适合递归下降解析器。回想一下第3章 ,主要的 EBNF 扩展是括号和方括号,前者指定括号内的内容可以出现零次或多次,后者指定括号内的内容可以出现一次或根本不出现。请注意,在这两种情况下,括号内的符号都是可选的。请考虑以下示例:
EBNF is ideally suited for recursive-descent parsers. Recall from Chapter 3 that the primary EBNF extensions are braces, which specify that what they enclose can appear zero or more times, and brackets, which specify that what they enclose can appear once or not at all. Note that in both cases, the enclosed symbols are optional. Consider the following examples:
<if_语句>
if<逻辑表达式> <语句> [ else<语句>]
<ident_list>
身份 {, 身份}
<if_statement>
if <logic_expr> <statement> [else <statement>]
<ident_list>
ident {, ident}
在第一条规则中,语句else的子句if是可选的。在第二条规则中,<ident_list> 是一个标识符,后面跟着零个或多个逗号和一个标识符。
In the first rule, the else clause of an if statement is optional. In the second, an <ident_list> is an identifier, followed by zero or more repetitions of a comma and an identifier.
递归下降解析器针对其相关语法中的每个非终结符都有一个子程序。与特定非终结符相关的子程序的职责如下:当给定一个输入字符串时,它会找出可以以该非终结符为根且其叶子与输入字符串匹配的解析树。实际上,递归下降解析子程序是针对其相关非终结符生成的语言(字符串集)的解析器。
A recursive-descent parser has a subprogram for each nonterminal in its associated grammar. The responsibility of the subprogram associated with a particular nonterminal is as follows: When given an input string, it traces out the parse tree that can be rooted at that nonterminal and whose leaves match the input string. In effect, a recursive-descent parsing subprogram is a parser for the language (set of strings) that is generated by its associated nonterminal.
考虑以下简单算术表达式的 EBNF 描述:
Consider the following EBNF description of simple arithmetic expressions:
<表达式>
<术语> {( +| -) <术语>} <
术语>
<因素> {( *| /) <因素>} <
因素>
id | int_constant | (<表达式>)
<expr>
<term> {(+ | -) <term>}
<term>
<factor> {(* | /) <factor>}
<factor>
id | int_constant | ( <expr> )
回想一下第3章 ,算术表达式的 EBNF 语法(例如此语法)不强制任何结合性规则。因此,在使用此类语法作为编译器的基础时,必须注意确保代码生成过程(通常由语法分析驱动)生成的代码遵守语言的结合性规则。使用递归下降解析时可以轻松做到这一点。
Recall from Chapter 3 that an EBNF grammar for arithmetic expressions, such as this one, does not force any associativity rule. Therefore, when using such a grammar as the basis for a compiler, one must take care to ensure that the code generation process, which is normally driven by syntax analysis, produces code that adheres to the associativity rules of the language. This can be done easily when recursive-descent parsing is used.
在下面的递归下降函数中,expr词法分析器是第 4.2节 中实现的函数。它获取下一个词素并将其标记代码放入全局变量中nextToken。标记代码定义为命名常量,如第4.2节 中所述。
In the following recursive-descent function, expr, the lexical analyzer is the function that is implemented in Section 4.2. It gets the next lexeme and puts its token code in the global variable nextToken. The token codes are defined as named constants, as in Section 4.2.
具有单个 RHS 的规则的递归下降子程序相对简单。对于 RHS 中的每个终结符,将该终结符与 进行比较nextToken。如果它们不匹配,则为语法错误。如果它们匹配,则调用词法分析器以获取下一个输入标记。对于每个非终结符,调用该非终结符的解析子程序。
A recursive-descent subprogram for a rule with a single RHS is relatively simple. For each terminal symbol in the RHS, that terminal symbol is compared with nextToken. If they do not match, it is a syntax error. If they match, the lexical analyzer is called to get the next input token. For each nonterminal, the parsing subprogram for that nonterminal is called.
用 C 语言编写的上例文法中第一条规则的递归下降子程序是
The recursive-descent subprogram for the first rule in the previous example grammar, written in C, is
/* expr
Parses strings in the language generated by the rule:
<expr> -> <term> {(+ | -) <term>}
*/
void expr() {
printf("Enter <expr>\n");
/* Parse the first term */
term();
/* As long as the next token is + or -, get
the next token and parse the next term */
while (nextToken == ADD_OP || nextToken == SUB_OP) {
lex();
term();
}
printf("Exit <expr>\n");
} /* End of function expr */
/* expr
Parses strings in the language generated by the rule:
<expr> -> <term> {(+ | -) <term>}
*/
void expr() {
printf("Enter <expr>\n");
/* Parse the first term */
term();
/* As long as the next token is + or -, get
the next token and parse the next term */
while (nextToken == ADD_OP || nextToken == SUB_OP) {
lex();
term();
}
printf("Exit <expr>\n");
} /* End of function expr */
请注意,该expr函数包含跟踪输出语句,这些语句用于生成本节后面显示的示例输出。
Notice that the expr function includes tracing output statements, which are included to produce the example output shown later in this section.
递归下降解析子程序的编写遵循的惯例是,每个子程序都将输入的下一个标记留在 中nextToken。因此,每当解析函数开始时,它都会假定nextToken具有解析过程中尚未使用的输入最左边标记的代码。
Recursive-descent parsing subprograms are written with the convention that each one leaves the next token of input in nextToken. So, whenever a parsing function begins, it assumes that nextToken has the code for the leftmost token of the input that has not yet been used in the parsing process.
函数解析的语言部分expr由一个或多个术语组成,这些术语由加号或减号运算符分隔。这是由非终结符 <expr> 生成的语言。因此,它首先调用解析术语的函数 ( term)。然后,只要它找到ADD_OP或SUB_OP标记(通过调用传递它们lex)。此递归下降函数比大多数函数都简单,因为其关联规则只有一个 RHS。此外,它不包含任何用于语法错误检测或恢复的代码,因为没有与语法规则关联的可检测错误。
The part of the language that the expr function parses consists of one or more terms, separated by either plus or minus operators. This is the language generated by the nonterminal <expr>. Therefore, first it calls the function that parses terms (term). Then it continues to call that function as long as it finds ADD_OP or SUB_OP tokens (which it passes over by calling lex). This recursive-descent function is simpler than most, because its associated rule has only one RHS. Furthermore, it does not include any code for syntax error detection or recovery, because there are no detectable errors associated with the grammar rule.
递归下降解析子程序用于解析规则中有多个 RHS 的非终结符,该程序首先会执行代码来确定要解析哪个 RHS。在编译器构造时,会检查每个 RHS,以确定可以出现在其生成的句子开头的终结符集。通过将这些集合与下一个输入标记进行匹配,解析器可以选择正确的 RHS。
A recursive-descent parsing subprogram for a nonterminal whose rule has more than one RHS begins with code to determine which RHS is to be parsed. Each RHS is examined (at compiler construction time) to determine the set of terminal symbols that can appear at the beginning of sentences it can generate. By matching these sets against the next token of input, the parser can choose the correct RHS.
<term> 的解析子程序与 <expr> 的解析子程序类似:
The parsing subprogram for <term> is similar to that for <expr>:
/* term
Parses strings in the language generated by the rule:
<term> -> <factor> {(* | /) <factor>)
*/
void term() {
printf("Enter <term>\n");
/* Parse the first factor */
factor();
/* As long as the next token is * or /, get the
next token and parse the next factor */
while (nextToken == MULT_OP || nextToken == DIV_OP) {
lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
/* term
Parses strings in the language generated by the rule:
<term> -> <factor> {(* | /) <factor>)
*/
void term() {
printf("Enter <term>\n");
/* Parse the first factor */
factor();
/* As long as the next token is * or /, get the
next token and parse the next factor */
while (nextToken == MULT_OP || nextToken == DIV_OP) {
lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
我们的算术表达式语法的 <factor> 非终结符函数必须在其两个 RHS 之间进行选择。它还包括错误检测。在 <factor> 函数中,对检测到语法错误的反应只是调用该error函数。在实际解析器中,检测到错误时必须生成诊断消息。此外,解析器必须从错误中恢复,以便解析过程可以继续。
The function for the <factor> nonterminal of our arithmetic expression grammar must choose between its two RHSs. It also includes error detection. In the function for <factor>, the reaction to detecting a syntax error is simply to call the error function. In a real parser, a diagnostic message must be produced when an error is detected. Furthermore, parsers must recover from the error so that the parsing process can continue.
/* factor
Parses strings in the language generated by the rule:
<factor> -> id | int_constant | ( <expr )
*/
void factor() {
printf("Enter <factor>\n");
/* Determine which RHS */
if (nextToken == IDENT || nextToken == INT_LIT)
/* Get the next token */
lex();
/* If the RHS is ( <expr> ), call lex to pass over the
left parenthesis, call expr, and check for the right
parenthesis */
else {
if (nextToken == LEFT_PAREN) {
lex();
expr();
if (nextToken == RIGHT_PAREN)
lex();
else
error();
} /* End of if (nextToken == ... */
/* It was not an id, an integer literal, or a left
parenthesis */
else error();
} /* End of else */
printf("Exit <factor>\n");;
} /* End of function factor */
/* factor
Parses strings in the language generated by the rule:
<factor> -> id | int_constant | ( <expr )
*/
void factor() {
printf("Enter <factor>\n");
/* Determine which RHS */
if (nextToken == IDENT || nextToken == INT_LIT)
/* Get the next token */
lex();
/* If the RHS is ( <expr> ), call lex to pass over the
left parenthesis, call expr, and check for the right
parenthesis */
else {
if (nextToken == LEFT_PAREN) {
lex();
expr();
if (nextToken == RIGHT_PAREN)
lex();
else
error();
} /* End of if (nextToken == ... */
/* It was not an id, an integer literal, or a left
parenthesis */
else error();
} /* End of else */
printf("Exit <factor>\n");;
} /* End of function factor */
以下是示例表达式 的解析跟踪(sum + 47) / total,使用解析函数expr、term和以及第 4.2节中的factor函数。请注意,解析从调用和起始符号例程开始,在本例中为。lex lexexpr
Following is the trace of the parse of the example expression (sum + 47) / total, using the parsing functions expr, term, and factor, and the function lex from Section 4.2. Note that the parse begins by calling lex and the start symbol routine, in this case, expr.
Next token is: 25 Next lexeme is (
Enter <expr>
Enter <term>
Enter <factor>
Next token is: 11 Next lexeme is sum
Enter <expr>
Enter <term>
Enter <factor>
Next token is: 21 Next lexeme is +
Exit <factor>
Exit <term>
Next token is: 10 Next lexeme is 47
Enter <term>
Enter <factor>
Next token is: 26 Next lexeme is )
Exit <factor>
Exit <term>
Exit <expr>
Next token is: 24 Next lexeme is /
Exit <factor>
Next token is: 11 Next lexeme is total
Enter <factor>
Next token is: -1 Next lexeme is EOF
Exit <factor>
Exit <term>
Exit <expr>
Next token is: 25 Next lexeme is (
Enter <expr>
Enter <term>
Enter <factor>
Next token is: 11 Next lexeme is sum
Enter <expr>
Enter <term>
Enter <factor>
Next token is: 21 Next lexeme is +
Exit <factor>
Exit <term>
Next token is: 10 Next lexeme is 47
Enter <term>
Enter <factor>
Next token is: 26 Next lexeme is )
Exit <factor>
Exit <term>
Exit <expr>
Next token is: 24 Next lexeme is /
Exit <factor>
Next token is: 11 Next lexeme is total
Enter <factor>
Next token is: -1 Next lexeme is EOF
Exit <factor>
Exit <term>
Exit <expr>
解析器针对上述表达式追踪的解析树如图4.2 所示。
The parse tree traced by the parser for the preceding expression is shown in Figure 4.2.
另一个示例语法规则和解析函数应该有助于巩固读者对递归下降解析的理解。以下是 Javaif语句的语法描述:
One more example grammar rule and parsing function should help solidify the reader’s understanding of recursive-descent parsing. Following is a grammatical description of the Java if statement:
<ifstmt>
if(<布尔表达式>)<语句> [ else<语句>]
<ifstmt>
if (<boolexpr>) <statement> [else <statement>]
该规则的递归下降子程序如下:
The recursive-descent subprogram for this rule follows:
/* Function ifstmt
Parses strings in the language generated by the rule:
<ifstmt> -> if (<boolexpr>) <statement>
[else <statement>]
*/
void ifstmt() {
/* Be sure the first token is 'if' */
if (nextToken = IF_CODE)
error();
else {
/* Call lex to get to the next token */
lex();
/* Check for the left parenthesis */
if (nextToken = LEFT_PAREN)
error();
else {
/* Parse the Boolean expression */
boolexpr();
/* Check for the right parenthesis */
if (nextToken = RIGHT_PAREN)
error();
else {
/* Parse the then clause */
statement();
/* If an else is next, parse the else clause */
if (nextToken == ELSE_CODE) {
/* Call lex to get over the else */
lex();
statement();
} /* end of if (nextToken == ELSE_CODE ... */
} /* end of else of if (nextToken != RIGHT ... */
} /* end of else of if (nextToken != LEFT ... */
} /* end of else of if (nextToken != IF_CODE ... */
} /* end of ifstmt */
/* Function ifstmt
Parses strings in the language generated by the rule:
<ifstmt> -> if (<boolexpr>) <statement>
[else <statement>]
*/
void ifstmt() {
/* Be sure the first token is 'if' */
if (nextToken = IF_CODE)
error();
else {
/* Call lex to get to the next token */
lex();
/* Check for the left parenthesis */
if (nextToken = LEFT_PAREN)
error();
else {
/* Parse the Boolean expression */
boolexpr();
/* Check for the right parenthesis */
if (nextToken = RIGHT_PAREN)
error();
else {
/* Parse the then clause */
statement();
/* If an else is next, parse the else clause */
if (nextToken == ELSE_CODE) {
/* Call lex to get over the else */
lex();
statement();
} /* end of if (nextToken == ELSE_CODE ... */
} /* end of else of if (nextToken != RIGHT ... */
} /* end of else of if (nextToken != LEFT ... */
} /* end of else of if (nextToken != IF_CODE ... */
} /* end of ifstmt */
请注意,此函数使用本节未给出的语句和布尔表达式的解析器函数。
Notice that this function uses parser functions for statements and Boolean expressions that are not given in this section.
这些示例的目的是让您相信,如果语言有适当的语法,那么可以轻松编写递归下降解析器。下一小节将讨论允许构建递归下降解析器的语法的特征。
The objective of these examples is to convince you that a recursive-descent parser can be easily written if an appropriate grammar is available for the language. The characteristics of a grammar that allows a recursive-descent parser to be built are discussed in the following subsection.
在选择使用递归下降作为编译器或其他程序分析工具的解析策略之前,必须考虑该方法在语法限制方面的局限性。本节讨论这些限制及其可能的解决方案。
Before choosing to use recursive descent as a parsing strategy for a compiler or other program analysis tool, one must consider the limitations of the approach, in terms of grammar restrictions. This section discusses these restrictions and their possible solutions.
导致 LL 解析器出现灾难性问题的一个简单的语法特征是左递归。例如,考虑以下规则:
One simple grammar characteristic that causes a catastrophic problem for LL parsers is left recursion. For example, consider the following rule:
A 的递归下降解析器子程序立即调用自身来解析其 RHS 中的第一个符号。A 解析器子程序的激活然后立即再次调用自身,如此反复。很容易看出这无济于事(除了导致堆栈溢出)。
A recursive-descent parser subprogram for A immediately calls itself to parse the first symbol in its RHS. That activation of the A parser subprogram then immediately calls itself again, and again, and so forth. It is easy to see that this leads nowhere (except to stack overflow).
规则中的左递归 被称为直接左递归,因为它只出现在一条规则中。可以通过以下过程从语法中消除直接左递归:
The left recursion in the rule is called direct left recursion, because it occurs in one rule. Direct left recursion can be eliminated from a grammar by the following process:
对于每个非终结符 A,
For each nonterminal, A,
将 A 规则分组为
其中没有一个
以 A 开头
Group the A-rules as
where none of the
begins with A
用以下方式替换原始 A 规则
Replace the original A-rules with
注意 指定空字符串。具有 因为它的 RHS 被称为擦除规则,因为在推导过程中使用它实际上会从句子形式中删除它的 LHS。
Note that specifies the empty string. A rule that has as its RHS is called an erasure rule, because its use in a derivation effectively erases its LHS from the sentential form.
考虑以下示例语法和上述过程的应用:
Consider the following example grammar and the application of the above process:
对于 E 规则,我们有
= +T和
=T,所以我们用 E 规则代替
For the E-rules, we have
= + T and
= T, so we replace the E-rules with
对于 T 规则,我们有
=*F 和
=F,所以我们用以下规则代替 T 规则
For the T-rules, we have
= *F and
= F, so we replace the T-rules with
因为 F 规则中没有左递归,它们保持不变,所以完整的替换语法是
Because there is no left recursion in the F-rules, they remain the same, so the complete replacement grammar is
该语法生成与原始语法相同的语言,但不是递归的。
This grammar generates the same language as the original grammar but is not left recursive.
与第4.4.1节 中使用 EBNF 编写的表达式语法一样,此语法未指定运算符的左结合性。但是,基于此语法设计代码生成相对容易,以便加法和乘法运算符具有左结合性。
As was the case with the expression grammar written using EBNF in Section 4.4.1, this grammar does not specify left associativity of operators. However, it is relatively easy to design the code generation based on this grammar so that the addition and multiplication operators will have left associativity.
间接左递归与直接左递归存在同样的问题。例如,假设我们有
Indirect left recursion poses the same problem as direct left recursion. For example, suppose we have
这些规则的递归下降解析器会让 A 子程序立即调用 B 子程序,而后者又会立即调用 A 子程序。因此,问题与直接左递归相同。左递归的问题并不局限于构建自上而下解析器的递归下降方法。它是所有自上而下的解析算法的问题。幸运的是,左递归不是自下而上解析算法的问题。
A recursive-descent parser for these rules would have the A subprogram immediately call the subprogram for B, which immediately calls the A subprogram. So, the problem is the same as for direct left recursion. The problem of left recursion is not confined to the recursive-descent approach to building top-down parsers. It is a problem for all top-down parsing algorithms. Fortunately, left recursion is not a problem for bottom-up parsing algorithms.
有一种算法可以修改给定的语法以消除间接左递归(Aho 等人,2006 年),但本文不作介绍。在为编程语言编写语法时,通常可以避免包含左递归,包括直接和间接的递归。
There is an algorithm to modify a given grammar to remove indirect left recursion (Aho et al., 2006), but it is not covered here. When writing a grammar for a programming language, one can usually avoid including left recursion, both direct and indirect.
左递归并不是唯一不允许自上而下解析的语法特征。另一个特征是,解析器是否始终能够根据输入的下一个标记选择正确的 RHS,并且仅使用当前句型中最左边的非终结符生成的第一个标记。有一个相对简单的非左递归语法测试可以指示是否可以做到这一点,称为成对不相交测试。此测试需要能够根据语法中给定非终结符的 RHS 计算一个集合。这些集合称为 FIRST,定义为
Left recursion is not the only grammar trait that disallows top-down parsing. Another is whether the parser can always choose the correct RHS on the basis of the next token of input, using only the first token generated by the leftmost nonterminal in the current sentential form. There is a relatively simple test of a non-left recursive grammar that indicates whether this can be done, called the pairwise disjointness test. This test requires the ability to compute a set based on the RHSs of a given nonterminal symbol in a grammar. These sets, which are called FIRST, are defined as
第一的
=
=>*
(如果
=>*
是第一
)
FIRST
=
=>*
(IF
=>*
is in FIRST
)
其中=>*表示0个或更多个推导步骤。
in which =>* means 0 or more derivation steps.
计算任意混合字符串的 FIRST 的算法 可以在Aho et al. (2006)中找到。就我们的目的而言,FIRST 通常可以通过检查语法来计算。
An algorithm to compute FIRST for any mixed string can be found in Aho et al. (2006). For our purposes, FIRST can usually be computed by inspection of the grammar.
成对不相交性检验如下:
The pairwise disjointness test is as follows:
对于每个非终结符 A,在具有多个 RHS 的语法中,对于每对规则, 和 一定是这样的
For each nonterminal, A, in the grammar that has more than one RHS, for each pair of rules, and it must be true that
(两个集合的交集, 和 必须为空。
(The intersection of the two sets, and must be empty.)
换句话说,如果非终结符 A 有多个 RHS,则在推导中为每个 RHS 生成的第一个终结符必须对该 RHS 是唯一的。考虑以下规则:
In other words, if a nonterminal A has more than one RHS, the first terminal symbol that can be generated in a derivation for each of them must be unique to that RHS. Consider the following rules:
A 规则的 RHS 的 FIRST 集合是 {a}、{b} 和 {c}、{d},它们显然是不相交的。因此,这些规则通过了成对不相交性测试。就递归下降解析器而言,这意味着解析非终结符 A 的子程序的代码可以通过仅查看由非终结符生成的第一个输入终结符(标记)来选择要处理的 RHS。现在考虑规则
The FIRST sets for the RHSs of the A-rules are {a}, {b}, and {c}, {d}, which are clearly disjoint. Therefore, these rules pass the pairwise disjointness test. What this means, in terms of a recursive-descent parser, is that the code of the subprogram for parsing the nonterminal A can choose which RHS it is dealing with by seeing only the first terminal symbol of input (token) that is generated by the nonterminal. Now consider the rules
A 规则中 RHS 的 FIRST 集合是 {a} 和 {a}, {b},它们显然不相交。因此,这些规则未通过成对不相交测试。就解析器而言,A 的子程序无法通过查看下一个输入符号来确定正在解析哪个 RHS,因为如果它是 a,则它可能是 RHS。如果一个或多个 RHS 以非终结符开头,这个问题当然会更加复杂。
The FIRST sets for the RHSs in the A-rules are {a} and {a}, {b} which are clearly not disjoint. So, these rules fail the pairwise disjointness test. In terms of the parser, the subprogram for A could not determine which RHS was being parsed by looking at the next symbol of input, because if it were an a, it could be either RHS. This issue is of course more complex if one or more of the RHSs begin with nonterminals.
在许多情况下,无法通过成对不相交性测试的语法可以进行修改,以便通过测试。例如,考虑规则
In many cases, a grammar that fails the pairwise disjointness test can be modified so that it will pass the test. For example, consider the rule
<变量> 标识符 标识符 [<表达式>]
<variable> identifier identifier [<expression>]
这说明 <variable> 要么是标识符,要么是标识符后跟括号中的表达式(下标)。这些规则显然没有通过成对不相交测试,因为两个 RHS 都以相同的终端标识符开头。这个问题可以通过称为左分解的过程来缓解。
This states that a <variable> is either an identifier or an identifier followed by an expression in brackets (a subscript). These rules clearly do not pass the pairwise disjointness test, because both RHSs begin with the same terminal, identifier. This problem can be alleviated through a process called left factoring.
现在我们来非正式地看一下左分解。考虑一下 <variable> 的规则。两个 RHS 都以标识符开头。两个 RHS 中标识符后面的部分是 (空字符串)和 [<表达式>]。上述两条规则也可以用下面两条规则代替:
We now take an informal look at left factoring. Consider our rules for <variable>. Both RHSs begin with identifier. The parts that follow identifier in the two RHSs are (the empty string) and [<expression>]. The two rules can be replaced by the following two rules:
<变量>
标识符 <新>
<新>
[<表达式>]
<variable>
identifier <new>
<new>
[<expression>]
不难看出,这两条规则组合起来产生的语言和我们开始提到的两条规则是一样的。然而,这两条规则通过了成对不相交性测试。
It is not difficult to see that together, these two rules generate the same language as the two rules with which we began. However, these two pass the pairwise disjointness test.
如果语法被用作递归下降解析器的基础,则可以使用左分解的替代方法。使用 EBNF 扩展,问题会以与左分解解决方案非常相似的方式消失。考虑上面 <variable> 的原始规则。下标可以通过将其放在方括号中来变为可选的,如下所示
If the grammar is being used as the basis for a recursive-descent parser, an alternative to left factoring is available. With an EBNF extension, the problem disappears in a way that is very similar to the left factoring solution. Consider the original rules above for <variable>. The subscript can be made optional by placing it in square brackets, as in
<变量> 标识符 [[<表达式>]]
<variable> identifier [[<expression>]]
在此规则中,外括号是元符号,表示里面的内容是可选的。内括号是所描述的编程语言的终端符号。重点是我们用一条规则替换了两条规则,该规则生成相同的语言,但通过了成对不相交测试。
In this rule, the outer brackets are metasymbols that indicate that what is inside is optional. The inner brackets are terminal symbols of the programming language being described. The point is that we replaced two rules with a single rule that generates the same language but passes the pairwise disjointness test.
左分解的正式算法可以在Aho et al. (2006)中找到。左分解不能解决语法的所有成对不相交问题。在某些情况下,必须以其他方式重写规则才能消除该问题。
A formal algorithm for left factoring can be found in Aho et al. (2006). Left factoring cannot solve all pairwise disjointness problems of grammars. In some cases, rules must be rewritten in other ways to eliminate the problem.
本节介绍自底向上解析的一般过程,并包括LR解析算法的描述。
This section introduces the general process of bottom-up parsing and includes a description of the LR parsing algorithm.
考虑以下算术表达式的语法:
Consider the following grammar for arithmetic expressions:
+
*
(埃)
+
*
(E)
请注意,此语法生成与第 4.4节 中的示例相同的算术表达式。不同之处在于此语法是左递归的,这对于自下而上的解析器来说是可以接受的。还请注意,自下而上的解析器的语法通常不包含元符号,例如用于指定 BNF 扩展的元符号。以下最右推导说明了此语法:
Notice that this grammar generates the same arithmetic expressions as the example in Section 4.4. The difference is that this grammar is left recursive, which is acceptable to bottom-up parsers. Also note that grammars for bottom-up parsers normally do not include metasymbols such as those used to specify extensions to BNF. The following rightmost derivation illustrates this grammar:
E => E + T
=> E + T * F
=> E +T * id
=> E + F * id
=>E + id id T id *id F id id id id id
=> +*
=> +*
=> +*
E => E + T
=> E + T * F
=> E + T * id
=> E + F * id
=> E + id * id
=> T + id * id
=> F + id * id
=> id + id * id
此推导中每个句型的下划线部分是 RHS,它被重写为其对应的 LHS 以获得前一个句型。自下而上的解析过程产生的是右推导的逆过程。因此,在示例推导中,自下而上的解析器从最后一个句型(输入句子)开始,并从那里生成句型序列,直到剩下的只有起始符号,在本语法中为 E。在每个步骤中,自下而上的解析器的任务是在句型中找到必须重写才能获得下一个(前一个)句型的特定 RHS(句柄)。如前所述,右句型可能包含多个 RHS。例如,右句型
The underlined part of each sentential form in this derivation is the RHS that is rewritten as its corresponding LHS to get the previous sentential form. The process of bottom-up parsing produces the reverse of a rightmost derivation. So, in the example derivation, a bottom-up parser starts with the last sentential form (the input sentence) and produces the sequence of sentential forms from there until all that remains is the start symbol, which in this grammar is E. In each step, the task of the bottom-up parser is to find the specific RHS, the handle, in the sentential form that must be rewritten to get the next (previous) sentential form. As mentioned earlier, a right sentential form may include more than one RHS. For example, the right sentential form
气道+通气*
E + T * id
包括三个 RHS, T 和 id。其中只有一个是句柄。例如,如果 RHS 被选择以这种句子形式重写,那么得到的句子形式将是 E * id,但是 E * id 不是给定语法的合法句子形式。
includes three RHSs, T and id. Only one of these is the handle. For example, if the RHS were chosen to be rewritten in this sentential form, the resulting sentential form would be E * id, but E * id is not a legal right sentential form for the given grammar.
右句型的句柄是唯一的。自下而上的解析器的任务是找到任何给定的右句型的句柄,该句型可以由其相关语法生成。句柄的形式定义如下:
The handle of a right sentential form is unique. The task of a bottom-up parser is to find the handle of any given right sentential form that can be generated by its associated grammar. Formally, handle is defined as follows:
定义: 是正确句子形式的句柄 当且仅当
Definition: is the handle of the right sentential form if and only if
在这个定义中, 指定最右推导步骤,并且 指定零个或多个最右推导步骤。尽管虽然句柄在数学上简洁,但它对找到给定右句型的句柄几乎没有帮助。下面,我们给出了与句柄相关的几个句型子串的定义。这些定义的目的是提供一些关于句柄的直觉。
In this definition, specifies a rightmost derivation step, and specifies zero or more rightmost derivation steps. Although the definition of a handle is mathematically concise, it provides little help in finding the handle of a given right sentential form. In the following, we provide the definitions of several substrings of sentential forms that are related to handles. The purpose of these is to provide some intuition about handles.
定义: 是正确句子形式的短语 当且仅当
Definition: is a phrase of the right sentential form if and only if
在这个定义中, 表示一个或多个推导步骤。
In this definition, means one or more derivation steps.
定义: 是一个正确句子形式的简单短语 当且仅当
Definition: is a simple phrase of the right sentential form if and only if
如果仔细比较这两个定义,就会发现它们仅在最后的推导规范上有所不同。短语的定义使用一个或多个步骤,而简单短语的定义仅使用一个步骤。
If these two definitions are compared carefully, it is clear that they differ only in the last derivation specification. The definition of phrase uses one or more steps, while the definition of simple phrase uses exactly one step.
短语和简单短语的定义似乎和句柄一样缺乏实用价值,但事实并非如此。考虑一下短语相对于解析树而言是什么。它是部分解析树的所有叶子的字符串,以整个解析树的一个特定内部节点为根。简单短语只是一个从其根非终端节点进行单个派生步骤的短语。就解析树而言,短语可以从一个或多个树级中的单个非终端派生而来,但简单短语只需在一个树级中派生即可。考虑图 4.3所示的解析树。
The definitions of phrase and simple phrase may appear to have the same lack of practical value as that of a handle, but that is not true. Consider what a phrase is relative to a parse tree. It is the string of all of the leaves of the partial parse tree that is rooted at one particular internal node of the whole parse tree. A simple phrase is just a phrase that takes a single derivation step from its root nonterminal node. In terms of a parse tree, a phrase can be derived from a single nonterminal in one or more tree levels, but a simple phrase can be derived in just a single tree level. Consider the parse tree shown in Figure 4.3.
图 4.3中的解析树的叶子构成了句子形式 因为有三个内部节点,所以有三个短语。每个内部节点都是子树的根,子树的叶子是一个短语。整个解析树的根节点 E 生成所有结果句子形式, 是一个短语。内部节点 T 生成叶子节点 T * id,这是另一个短语。最后,内部节点 F 生成 id,这也是一个短语。因此,句子形式的短语 是 和 id。请注意,短语不一定是底层语法中的 RHS。
The leaves of the parse tree in Figure 4.3 comprise the sentential form Because there are three internal nodes, there are three phrases. Each internal node is the root of a subtree, whose leaves are a phrase. The root node of the whole parse tree, E, generates all of the resulting sentential form, which is a phrase. The internal node, T, generates the leaves T * id, which is another phrase. Finally, the internal node, F, generates id, which is also a phrase. So, the phrases of the sentential form are and id. Notice that phrases are not necessarily RHSs in the underlying grammar.
简单短语是短语的子集。在上例中,唯一的简单短语是 id。简单短语在语法中始终是 RHS。
The simple phrases are a subset of the phrases. In the previous example, the only simple phrase is id. A simple phrase is always a RHS in the grammar.
讨论短语和简单短语的原因是:任何最右侧句子形式的句柄都是其最左侧的简单短语。因此,现在我们有一种非常直观的方法来查找任何右侧句子形式的句柄,假设我们有语法并且可以绘制解析树。这种查找句柄的方法对于解析器来说当然是不实用的。(如果您已经有了解析树,为什么还需要解析器?)它的唯一目的是让读者直观地了解句柄相对于解析树是什么,这比尝试从句子形式的角度来思考句柄要容易得多。
The reason for discussing phrases and simple phrases is this: The handle of any rightmost sentential form is its leftmost simple phrase. So now we have a highly intuitive way to find the handle of any right sentential form, assuming we have the grammar and can draw a parse tree. This approach to finding handles is of course not practical for a parser. (If you already have a parse tree, why do you need a parser?) Its only purpose is to provide the reader with some intuitive feel for what a handle is, relative to a parse tree, which is easier than trying to think about handles in terms of sentential forms.
现在我们可以从解析树的角度考虑自下而上的解析,尽管解析器的目的是生成解析树。给定整个句子的解析树,您可以轻松找到句柄,这是句子中要重写的第一件事,以获取先前的句子形式。然后可以从解析树中修剪句柄并重复该过程。继续到解析树的根,可以构造整个最右派生。
We can now consider bottom-up parsing in terms of parse trees, although the purpose of a parser is to produce a parse tree. Given the parse tree for an entire sentence, you easily can find the handle, which is the first thing to rewrite in the sentence to get the previous sentential form. Then the handle can be pruned from the parse tree and the process repeated. Continuing to the root of the parse tree, the entire rightmost derivation can be constructed.
自下而上的解析器通常称为移位-归约算法,因为移位和归约是它们指定的两个最常见的操作。每个自下而上的解析器不可或缺的一部分是堆栈。与其他解析器一样,自下而上的解析器的输入是程序的标记流,输出是一系列语法规则。移位操作将下一个输入标记移到解析器的堆栈上。归约操作将解析器堆栈顶部的 RHS(句柄)替换为其相应的 LHS。每个编程语言的解析器都是下推自动机( PDA ),因为 PDA 是上下文无关语言的识别器。您不需要熟悉 PDA 即可理解自下而上的解析器的工作原理,尽管了解它会有所帮助。PDA 是一种非常简单的数学机器,它从左到右扫描符号串。PDA 之所以如此命名,是因为它使用下推堆栈作为其内存。PDA 可用作上下文无关语言的识别器。给定一个上下文无关语言字母表上的一串符号,为此目的而设计的 PDA 可以确定该字符串是否是该语言中的句子。在此过程中,PDA 可以生成构建句子解析树所需的信息。
Bottom-up parsers are often called shift-reduce algorithms, because shift and reduce are the two most common actions they specify. An integral part of every bottom-up parser is a stack. As with other parsers, the input to a bottom-up parser is the stream of tokens of a program and the output is a sequence of grammar rules. The shift action moves the next input token onto the parser’s stack. A reduce action replaces an RHS (the handle) on top of the parser’s stack by its corresponding LHS. Every parser for a programming language is a pushdown automaton (PDA), because a PDA is a recognizer for a context-free language. You need not be intimate with PDAs to understand how a bottom-up parser works, although it helps. A PDA is a very simple mathematical machine that scans strings of symbols from left to right. A PDA is so named because it uses a pushdown stack as its memory. PDAs can be used as recognizers for context-free languages. Given a string of symbols over the alphabet of a context-free language, a PDA that is designed for the purpose can determine whether the string is or is not a sentence in the language. In the process, the PDA can produce the information needed to construct a parse tree for the sentence.
使用 PDA 时,输入字符串会从左到右逐个符号进行检查。输入的处理方式与存储在另一个堆栈中非常相似,因为 PDA 只会看到输入的最左边的符号。
With a PDA, the input string is examined, one symbol at a time, left to right. The input is treated very much as if it were stored in another stack, because the PDA never sees more than the leftmost symbol of the input.
请注意,递归下降解析器也是 PDA。在这种情况下,堆栈是运行时系统的堆栈,它记录子程序调用(以及其他内容),这些调用对应于语法的非终结符。
Note that a recursive-descent parser is also a PDA. In that case, the stack is that of the run-time system, which records subprogram calls (among other things), which correspond to the nonterminals of the grammar.
人们设计了许多不同的自下而上的解析算法。其中大多数是 LR 过程的变体。LR 解析器使用相对较小的程序和为特定编程语言构建的解析表。最初的 LR 算法是由 Donald Knuth ( Knuth, 1965 ) 设计的。该算法有时被称为规范 LR,在发布后的几年内并未得到使用,因为生成所需的解析表需要大量的计算机时间和内存。随后,人们开发了几种规范 LR 表构建过程的变体(DeRemer, 1971;DeRemer 和 Pennello, 1982)。这些变体具有两个特点:(1) 与规范 LR 算法相比,它们生成所需的解析表所需的计算机资源少得多;(2) 它们处理的语法类别比规范 LR 算法要小。
Many different bottom-up parsing algorithms have been devised. Most of them are variations of a process called LR. LR parsers use a relatively small program and a parsing table that is built for a specific programming language. The original LR algorithm was designed by Donald Knuth (Knuth, 1965). This algorithm, which is sometimes called canonical LR, was not used in the years immediately following its publication because producing the required parsing table required large amounts of computer time and memory. Subsequently, several variations on the canonical LR table construction process were developed (DeRemer, 1971; DeRemer and Pennello, 1982). These are characterized by two properties: (1) They require far less computer resources to produce the required parsing table than the canonical LR algorithm, and (2) they work on smaller classes of grammars than the canonical LR algorithm.
LR解析器有三个优点:
There are three advantages to LR parsers:
它们可以为所有编程语言构建。
They can be built for all programming languages.
他们可以通过从左到右的扫描尽快检测到语法错误。
They can detect syntax errors as soon as it is possible in a left-to-right scan.
LR 类语法是 LL 解析器可解析类的真超集(例如,许多左递归语法是 LR,但没有一个是 LL)。
The LR class of grammars is a proper superset of the class parsable by LL parsers (for example, many left recursive grammars are LR, but none are LL).
LR 解析的唯一缺点是很难手动生成完整编程语言中给定语法的解析表。不过,这并不是一个严重的缺点,因为有多个程序可以将语法作为输入并生成解析表,如本节后面所述。
The only disadvantage of LR parsing is that it is difficult to produce by hand the parsing table for a given grammar for a complete programming language. This is not a serious disadvantage, however, for there are several programs available that take a grammar as input and produce the parsing table, as discussed later in this section.
在 LR 解析算法出现之前,有许多解析算法通过查看被怀疑是句柄的句型子字符串的左侧和右侧来找到正确的句型句柄。Knuth 的见解是,人们可以有效地从可疑句柄的左侧一直查看到解析堆栈的底部,以确定它是否是句柄。但是,解析堆栈中与解析过程相关的所有信息都可以用一个状态来表示,该状态可以存储在堆栈的顶部。换句话说,Knuth 发现,就解析过程而言,无论输入字符串的长度、句型的长度或解析堆栈的深度如何,只有相对较少的不同情况。每种情况都可以用一个状态表示并存储在解析堆栈中,堆栈上的每个语法符号都有一个状态符号。堆栈顶部始终是一个状态符号,它表示从整个解析历史到当前时间的相关信息。我们将使用带下标的大写字母 S 来表示解析器状态。
Prior to the appearance of the LR parsing algorithm, there were a number of parsing algorithms that found handles of right sentential forms by looking both to the left and to the right of the substring of the sentential form that was suspected of being the handle. Knuth’s insight was that one could effectively look to the left of the suspected handle all the way to the bottom of the parse stack to determine whether it was the handle. But all of the information in the parse stack that was relevant to the parsing process could be represented by a single state, which could be stored on the top of the stack. In other words, Knuth discovered that regardless of the length of the input string, the length of the sentential form, or the depth of the parse stack, there were only a relatively small number of different situations, as far as the parsing process is concerned. Each situation could be represented by a state and stored in the parse stack, one state symbol for each grammar symbol on the stack. At the top of the stack would always be a state symbol, which represented the relevant information from the entire history of the parse, up to the current time. We will use subscripted uppercase S’s to represent the parser states.
图 4.4显示了 LR 解析器的结构。LR 解析器的解析堆栈内容具有以下形式:
Figure 4.4 shows the structure of an LR parser. The contents of the parse stack for an LR parser have the following form:
其中 S 是状态符号,X 是语法符号。LR 解析器配置是一对字符串(堆栈、输入),其详细形式为
where the S’s are state symbols and the X’s are grammar symbols. An LR parser configuration is a pair of strings (stack, input), with the detailed form
请注意,输入字符串的右端有一个美元符号。此符号在解析器初始化期间放置在那里。它用于解析器的正常终止。使用此解析器配置,我们可以正式定义基于解析表的 LR 解析器过程。
Notice that the input string has a dollar sign at its right end. This sign is put there during initialization of the parser. It is used for normal termination of the parser. Using this parser configuration, we can formally define the LR parser process, which is based on the parsing table.
LR 解析表有两个部分,分别称为 ACTION 和 GOTO。表的 ACTION 部分指定了解析器的大部分操作。它以状态符号作为行标签,以语法的终端符号作为列标签。给定当前解析器状态(由解析堆栈顶部的状态符号表示)和输入的下一个符号(标记),解析表指定了解析器应执行的操作。两个主要的解析器操作是移位和归约。解析器要么将下一个输入符号连同状态符号一起移到解析堆栈上,要么已经在堆栈顶部拥有句柄,并将其归约到规则的 LHS,其 RHS 与句柄相同。还有两个可能的操作:接受,表示解析器已成功完成输入的解析,以及错误,表示解析器检测到语法错误。
An LR parsing table has two parts, named ACTION and GOTO. The ACTION part of the table specifies most of what the parser does. It has state symbols as its row labels and the terminal symbols of the grammar as its column labels. Given a current parser state, which is represented by the state symbol on top of the parse stack, and the next symbol (token) of input, the parse table specifies what the parser should do. The two primary parser actions are shift and reduce. Either the parser shifts the next input symbol onto the parse stack, along with a state symbol, or it already has the handle on top of the stack, which it reduces to the LHS of the rule whose RHS is the same as the handle. Two other actions are possible: accept, which means the parser has successfully completed the parse of the input, and error, which means the parser has detected a syntax error.
LR 解析表的 GOTO 部分的行以状态符号作为标签。表的这一部分以非终结符作为列标签。表的 GOTO 部分中的值指示在完成归约后应将哪个状态符号推送到解析堆栈上,这意味着句柄已从解析堆栈中移除,并且新的非终结符已推送到解析堆栈上。在句柄及其相关状态符号被移除后,特定符号位于解析堆栈顶部的状态符号标签所在的行中。使用的 GOTO 表的列是带有标签的列,即归约中使用的规则的 LHS。
The rows of the GOTO part of the LR parsing table have state symbols as labels. This part of the table has nonterminals as column labels. The values in the GOTO part of the table indicate which state symbol should be pushed onto the parse stack after a reduction has been completed, which means the handle has been removed from the parse stack and the new nonterminal has been pushed onto the parse stack. The specific symbol is found at the row whose label is the state symbol on top of the parse stack after the handle and its associated state symbols have been removed. The column of the GOTO table that is used is the one with the label, that is the LHS of the rule used in the reduction.
考虑以下算术表达式的传统语法:
Consider the traditional grammar for arithmetic expressions that follows:
+电视
+ T
*F
* F
(埃)
(E)
该语法的规则已编号,以便提供一种在解析表中引用它们的简单方法。
The rules of this grammar are numbered to provide a simple way to reference them in a parsing table.
图 4.5显示了此语法的 LR 解析表。操作使用缩写:R 表示归约,S 表示移位。R4 表示使用规则 4 进行归约;S6 表示将输入的下一个符号移入堆栈并将状态 6 推送到堆栈。ACTION 表中的空位置表示语法错误。在完整的解析器中,这些可能调用错误处理例程。
Figure 4.5 shows the LR parsing table for this grammar. Abbreviations are used for the actions: R for reduce and S for shift. R4 means reduce using rule 4; S6 means shift the next symbol of input onto the stack and push state 6 onto the stack. Empty positions in the ACTION table indicate syntax errors. In a complete parser, these could have calls to error-handling routines.
LR 解析表可以很容易地使用软件工具(例如 yacc(Johnson,1975))构建,该工具以语法作为输入。虽然 LR 解析表可以手工生成,但对于真实编程语言的语法,这项任务会很冗长、乏味且容易出错。对于真实的编译器,LR 解析表总是使用软件工具生成的。
LR parsing tables can easily be constructed using a software tool, such as yacc (Johnson, 1975), which takes the grammar as input. Although LR parsing tables can be produced by hand, for a grammar of a real programming language, the task would be lengthy, tedious, and error prone. For real compilers, LR parsing tables are always generated with software tools.
LR 解析器的初始配置是
The initial configuration of an LR parser is
解析器操作非正式定义如下:
The parser actions are informally defined as follows:
Shift 过程很简单:将下一个输入符号与 ACTION 表中 Shift 规范的一部分的状态符号一起推入堆栈。
The Shift process is simple: The next symbol of input is pushed onto the stack, along with the state symbol that is part of the Shift specification in the ACTION table.
对于 Reduce 操作,必须从堆栈中移除句柄。因为堆栈上的每个语法符号都有一个状态符号,所以从堆栈中移除的符号数是句柄中的符号。在移除句柄及其相关状态符号后,规则的 LHS 被推送到堆栈上。最后,使用 GOTO 表,其中行标签是句柄及其状态符号从堆栈中移除时显示的符号,列标签是非终结符,即归约中使用的规则的 LHS。
For a Reduce action, the handle must be removed from the stack. Because for every grammar symbol on the stack there is a state symbol, the number of symbols removed from the stack is twice the number of symbols in the handle. After removing the handle and its associated state symbols, the LHS of the rule is pushed onto the stack. Finally, the GOTO table is used, with the row label being the symbol that was exposed when the handle and its state symbols were removed from the stack, and the column label being the nonterminal that is the LHS of the rule used in the reduction.
当操作为“接受”时,解析完成并且未发现任何错误。
When the action is Accept, the parse is complete and no errors were found.
当操作为错误时,解析器会调用错误处理例程。
When the action is Error, the parser calls an error-handling routine.
虽然基于 LR 概念的解析算法有很多,但它们的区别仅在于解析表的构造。所有 LR 解析器都使用相同的解析算法。
Although there are many parsing algorithms based on the LR concept, they differ only in the construction of the parsing table. All LR parsers use this same parsing algorithm.
通过一个例子可能是熟悉 LR 解析过程的最好方法。最初,解析堆栈只有一个符号 0,它代表解析器的状态 0。输入包含输入字符串,其右端附加一个结束标记(在本例中为美元符号)。在每个步骤中,解析器操作由解析堆栈顶部(图4.4 中最右边)的符号和下一个(图 4.4中最左边)的输入标记决定。从解析表 ACTION 部分的相应单元格中选择正确的操作。解析表的 GOTO 部分在缩减操作后使用。回想一下,GOTO 用于确定缩减后将哪个状态符号放在解析堆栈上。
Perhaps the best way to become familiar with the LR parsing process is through an example. Initially, the parse stack has the single symbol 0, which represents state 0 of the parser. The input contains the input string with an end marker, in this case a dollar sign, attached to its right end. At each step, the parser actions are dictated by the top (rightmost in Figure 4.4) symbol of the parse stack and the next (leftmost in Figure 4.4) token of input. The correct action is chosen from the corresponding cell of the ACTION part of the parse table. The GOTO part of the parse table is used after a reduction action. Recall that GOTO is used to determine which state symbol is placed on the parse stack after a reduction.
+下面是对字符串 id id id进行解析的轨迹,使用 LR 解析算法和图4.5*所示的解析表。
Following is a trace of a parse of the string id + id * id, using the LR parsing algorithm and the parsing table shown in Figure 4.5.
Aho 等人 (2006)描述了从给定语法生成 LR 解析表的算法,这些算法并不太复杂,但超出了编程语言书籍的范围。如前所述,有许多不同的软件系统可用于生成 LR 解析表。
The algorithms to generate LR parsing tables from given grammars, which are described in Aho et al. (2006), are not overly complex but are beyond the scope of a book on programming languages. As stated previously, there are a number of different software systems available to generate LR parsing tables.
无论使用何种实现方法,语法分析都是语言实现的常见部分。语法分析通常基于所实现语言的正式语法描述。上下文无关语法(也称为 BNF)是描述语法的最常用方法。语法分析任务通常分为两部分:词法分析和语法分析。将词法分析分开有几个原因 - 即简单性、效率和可移植性。
Syntax analysis is a common part of language implementation, regardless of the implementation approach used. Syntax analysis is normally based on a formal syntax description of the language being implemented. A context-free grammar, which is also called BNF, is the most common approach for describing syntax. The task of syntax analysis is usually divided into two parts: lexical analysis and syntax analysis. There are several reasons for separating lexical analysis—namely, simplicity, efficiency, and portability.
词法分析器是一种模式匹配器,它隔离程序中称为词素的小规模部分。词素以类别出现,例如整数文字和名称。这些类别称为标记。每个标记都分配有一个数字代码,该代码与词素一起由词法分析器生成。构建词法分析器有三种不同的方法:使用软件工具为表驱动分析器生成表格、手动构建这样的表格以及编写代码来实现正在实现的语言的标记的状态图描述。如果使用字符类进行转换,而不是从每个状态节点对每个可能的字符进行转换,则标记的状态图可以相当小。此外,可以使用表查找来识别保留字,从而简化状态图。
A lexical analyzer is a pattern matcher that isolates the small-scale parts of a program, which are called lexemes. Lexemes occur in categories, such as integer literals and names. These categories are called tokens. Each token is assigned a numeric code, which along with the lexeme is what the lexical analyzer produces. There are three distinct approaches to constructing a lexical analyzer: using a software tool to generate a table for a table-driven analyzer, building such a table by hand, and writing code to implement a state diagram description of the tokens of the language being implemented. The state diagram for tokens can be reasonably small if character classes are used for transitions, rather than having transitions for every possible character from every state node. Also, the state diagram can be simplified by using a table lookup to recognize reserved words.
语法分析器有两个目标:检测给定程序中的语法错误,并为给定程序生成解析树,或者可能仅生成构建此类树所需的信息。语法分析器要么是自上而下的,这意味着它们按自上而下的顺序构建最左派生和解析树,要么是自下而上的,在这种情况下,它们按自下而上的顺序构建最右派生的反向和解析树。适用于所有无歧义语法的解析器具有复杂性 然而,用于实现编程语言语法分析器的解析器作用于无歧义语法的子类,复杂度为 O(n)。
Syntax analyzers have two goals: to detect syntax errors in a given program and to produce a parse tree, or possibly only the information required to build such a tree, for a given program. Syntax analyzers are either top-down, meaning they construct leftmost derivations and a parse tree in top-down order, or bottom-up, in which case they construct the reverse of a rightmost derivation and a parse tree in bottom-up order. Parsers that work for all unambiguous grammars have complexity However, parsers used for implementing syntax analyzers for programming languages work on subclasses of unambiguous grammars and have complexity O(n).
递归下降解析器是一种 LL 解析器,通过直接从源语言的语法编写代码来实现。EBNF 是递归下降解析器的理想基础。递归下降解析器对语法中的每个非终结符都有一个子程序。如果给定的语法规则只有一个 RHS,则该规则的代码很简单。从左到右检查 RHS。对于每个非终结符,代码都会调用与该非终结符相关的子程序,该子程序会解析非终结符生成的任何内容。对于每个终结符,代码都会将终结符与下一个输入标记进行比较。如果它们匹配,代码只需调用词法分析器来获取下一个标记。如果不匹配,子程序就会报告语法错误。如果规则有多个 RHS,子程序必须首先确定应该解析哪个 RHS。必须能够根据下一个输入标记做出此决定。
A recursive-descent parser is an LL parser that is implemented by writing code directly from the grammar of the source language. EBNF is ideal as the basis for recursive-descent parsers. A recursive-descent parser has a subprogram for each nonterminal in the grammar. The code for a given grammar rule is simple if the rule has a single RHS. The RHS is examined left to right. For each nonterminal, the code calls the associated subprogram for that nonterminal, which parses whatever the nonterminal generates. For each terminal, the code compares the terminal with the next token of input. If they match, the code simply calls the lexical analyzer to get the next token. If they do not, the subprogram reports a syntax error. If a rule has more than one RHS, the subprogram must first determine which RHS it should parse. It must be possible to make this determination on the basis of the next token of input.
两种不同的语法特征阻碍了基于语法的递归下降解析器的构建。其中之一是左递归。从语法中消除直接左递归的过程相对简单。虽然我们没有介绍它,但存在一种算法可以同时消除直接和语法中的间接左递归。另一个问题是通过成对不相交测试检测出来的,该测试测试解析子程序是否可以根据下一个输入标记确定要解析哪个 RHS。一些未通过成对不相交测试的语法通常可以通过使用左分解进行修改以通过测试。
Two distinct grammar characteristics prevent the construction of a recursive- descent parser based on the grammar. One of these is left recursion. The process of eliminating direct left recursion from a grammar is relatively simple. Although we do not cover it, an algorithm exists to remove both direct and indirect left recursion from a grammar. The other problem is detected with the pairwise disjointness test, which tests whether a parsing subprogram can determine which RHS is being parsed on the basis of the next token of input. Some grammars that fail the pairwise disjointness test often can be modified to pass it, using left factoring.
自下而上的解析器的解析问题是找到当前句型的子字符串,该子字符串必须简化为其关联的 LHS 才能在最右侧推导中获得下一个(上一个)句型。此子字符串称为句型的句柄。解析树可以为识别句柄提供直观的基础。自下而上的解析器是一种移位-归约算法,因为在大多数情况下,它要么将输入的下一个词素移到解析堆栈上,要么归约堆栈顶部的句柄。
The parsing problem for bottom-up parsers is to find the substring of the current sentential form that must be reduced to its associated LHS to get the next (previous) sentential form in the rightmost derivation. This substring is called the handle of the sentential form. A parse tree can provide an intuitive basis for recognizing a handle. A bottom-up parser is a shift-reduce algorithm, because in most cases it either shifts the next lexeme of input onto the parse stack or reduces the handle that is on top of the stack.
LR 系列移位归约解析器是编程语言中最常用的自下而上的解析方法,因为此系列中的解析器与其他解析器相比具有多种优势。LR 解析器使用解析堆栈,其中包含语法符号和状态符号来维护解析器的状态。解析堆栈上的顶部符号始终是状态符号,它表示解析堆栈中与解析过程相关的所有信息。LR 解析器使用两个解析表:ACTION 和 GOTO。ACTION 部分指定在给定解析堆栈顶部的状态符号和下一个输入标记的情况下解析器应该做什么。GOTO 表用于确定在完成归约后应将哪个状态符号放在解析堆栈上。
The LR family of shift-reduce parsers is the most commonly used bottom-up parsing approach for programming languages, because parsers in this family have several advantages over alternatives. An LR parser uses a parse stack, which contains grammar symbols and state symbols to maintain the state of the parser. The top symbol on the parse stack is always a state symbol that represents all of the information in the parse stack that is relevant to the parsing process. LR parsers use two parsing tables: ACTION and GOTO. The ACTION part specifies what the parser should do, given the state symbol on top of the parse stack and the next token of input. The GOTO table is used to determine which state symbol should be placed on the parse stack after a reduction has been done.
语法分析器基于语法的三个原因是什么?
What are three reasons why syntax analyzers are based on grammars?
解释词法分析与语法分析分离的三个原因。
Explain the three reasons why lexical analysis is separated from syntax analysis.
定义词素和标记。
Define lexeme and token.
词法分析器的主要任务是什么?
What are the primary tasks of a lexical analyzer?
简要描述构建词法分析器的三种方法。
Describe briefly the three approaches to building a lexical analyzer.
什么是状态转换图?
What is a state transition diagram?
为什么词法分析器的状态图中的字母和数字转换使用字符类而不是单个字符?
Why are character classes used, rather than individual characters, for the letter and digit transitions of a state diagram for a lexical analyzer?
句法分析的两个不同目标是什么?
What are the two distinct goals of syntax analysis?
描述自上而下和自下而上的解析器之间的区别。
Describe the differences between top-down and bottom-up parsers.
描述自上而下的解析器的解析问题。
Describe the parsing problem for a top-down parser.
描述自下而上的解析器的解析问题。
Describe the parsing problem for a bottom-up parser.
解释为什么编译器使用只对所有语法的子集起作用的解析算法。
Explain why compilers use parsing algorithms that work on only a subset of all grammars.
为什么使用命名常量而不是数字来作为标记代码?
Why are named constants used, rather than numbers, for token codes?
描述如何为具有单个 RHS 的规则编写递归下降解析子程序。
Describe how a recursive-descent parsing subprogram is written for a rule with a single RHS.
解释禁止它们被用作自上而下解析器的基础的两个语法特征。
Explain the two grammar characteristics that prohibit them from being used as the basis for a top-down parser.
对于给定的语法和句子形式,FIRST 集是什么?
What is the FIRST set for a given grammar and sentential form?
描述成对不相交检验。
Describe the pairwise disjointness test.
左分解因式是什么?
What is left factoring?
什么是句子形式的短语?
What is a phrase of a sentential form?
句子形式的简单短语是什么?
What is a simple phrase of a sentential form?
句子形式的句柄是什么?
What is the handle of a sentential form?
自上而下和自下而上的解析器所基于的数学机器是什么?
What is the mathematical machine on which both top-down and bottom-up parsers are based?
描述LR解析器的三个优点。
Describe three advantages of LR parsers.
Knuth 在开发 LR 解析技术时有何见解?
What was Knuth’s insight in developing the LR parsing technique?
描述 LR 解析器的 ACTION 表的用途。
Describe the purpose of the ACTION table of an LR parser.
描述 LR 解析器的 GOTO 表的用途。
Describe the purpose of the GOTO table of an LR parser.
左递归对于 LR 解析器来说是个问题吗?
Is left recursion a problem for LR parsers?
对以下语法规则执行成对不相交测试。
Perform the pairwise disjointness test for the following grammar rules.
对以下语法规则执行成对不相交测试。
Perform the pairwise disjointness test for the following grammar rules.
Show a trace of the recursive descent parser given in Section 4.4.1 for the string a + b * c.
Show a trace of the recursive descent parser given in Section 4.4.1 for the string a * (b + c).
给定以下语法和正确的句子形式,绘制一棵解析树并显示短语和简单短语以及句柄。
抗体
巴布
抗体
Given the following grammar and the right sentential form, draw a parse tree and show the phrases and simple phrases, as well as the handle.
aaAbb
bBab
aaAbBb
给定以下语法和正确的句子形式,绘制一棵解析树并显示短语和简单短语以及句柄。
陳文
ABCABCB
baBcBbbc
Given the following grammar and the right sentential form, draw a parse tree and show the phrases and simple phrases, as well as the handle.
aAcccbbc
AbcaBccb
baBcBbbc
Show a complete parse, including the parse stack contents, input string, and action for the string id * (id + id), using the grammar and parse table in Section 4.5.3.
Show a complete parse, including the parse stack contents, input string, and action for the string (id + id) * id, using the grammar and parse table in Section 4.5.3.
获取从Aho et al. (2006) 的语法中删除间接左递归的算法。使用此算法从以下语法中删除所有左递归:
Get the algorithm to remove the indirect left recursion from a grammar from Aho et al. (2006). Use this algorithm to remove all left recursion from the following grammar:
设计一个状态图来识别 C 语言编程语言的注释的一种形式,即以 开头/*和以 结尾的注释*/。
Design a state diagram to recognize one form of the comments of the C-based programming languages, those that begin with /* and end with */.
设计一个状态图来识别您最喜欢的编程语言的浮点文字。
Design a state diagram to recognize the floating-point literals of your favorite programming language.
编写并测试代码来实现问题 1 的状态图。
Write and test the code to implement the state diagram of Problem 1.
编写并测试代码来实现问题2的状态图。
Write and test the code to implement the state diagram of Problem 2.
Modify the lexical analyzer given in Section 4.2 to recognize the following list of reserved words and return their respective token codes: for (FOR_CODE, 30), if (IF_CODE, 31), else (ELSE_CODE, 32), while (WHILE_CODE, 33), do (DO_CODE, 34), int (INT_CODE, 35), float (FLOAT_CODE, 36), switch (SWITCH_CODE, 37).
Convert the lexical analyzer (which is written in C) given in Section 4.2 to Java.
Convert the recursive descent parser routines for <expr>, <term>, and <factor> given in Section 4.4.1 to Java.
对于通过问题 1 中测试的规则,编写一个递归下降解析子程序来解析规则生成的语言。假设您有一个名为的词法分析器lex和一个名为的错误处理子程序error,每当检测到语法错误时就会调用该程序。
For those rules that pass the test in Problem 1, write a recursive-descent parsing subprogram that parses the language generated by the rules. Assume you have a lexical analyzer named lex and an error-handling subprogram named error, which is called whenever a syntax error is detected.
对于通过问题 2 中测试的规则,编写一个递归下降解析子程序来解析规则生成的语言。假设您有一个名为的词法分析器lex和一个名为的错误处理子程序error,每当检测到语法错误时都会调用该程序。
For those rules that pass the test in Problem 2, write a recursive-descent parsing subprogram that parses the language generated by the rules. Assume you have a lexical analyzer named lex and an error-handling subprogram named error, which is called whenever a syntax error is detected.
实现并测试4.5.3节 中给出的LR解析算法。
Implement and test the LR parsing algorithm given in Section 4.5.3.
编写一个描述Java或C++语句的EBNF规则while。用Java或C++编写该规则的递归下降子程序。
Write an EBNF rule that describes the while statement of Java or C++. Write the recursive-descent subprogram in Java or C++ for this rule.
编写一个描述Java或C++语句的EBNF规则for。用Java或C++编写该规则的递归下降子程序。
Write an EBNF rule that describes the for statement of Java or C++. Write the recursive-descent subprogram in Java or C++ for this rule.
本章介绍变量的基本语义问题。首先介绍编程语言中名称和特殊词的性质。然后讨论变量的属性,包括类型、地址和值,包括别名问题。接下来介绍绑定和绑定时间的重要概念,包括变量属性的不同可能绑定时间以及它们如何定义四种不同类型的变量。随后,介绍两种非常不同的名称作用域规则,即静态和动态,以及语句的引用环境的概念。最后,讨论命名常量和变量初始化。
This chapter introduces the fundamental semantic issues of variables. It begins by describing the nature of names and special words in programming languages. The attributes of variables, including type, address, and value, are then discussed, including the issue of aliases. The important concepts of binding and binding times are introduced next, including the different possible binding times for variable attributes and how they define four different categories of variables. Following that, two very different scoping rules for names, static and dynamic, are described, along with the concept of a referencing environment of a statement. Finally, named constants and variable initialization are discussed.
命令式编程语言在不同程度上是对底层冯·诺依曼计算机体系结构的抽象。该体系结构的两个主要组成部分是内存(用于存储指令和数据)和处理器(用于提供修改内存内容的操作)。语言中对机器存储单元的抽象是变量。在某些情况下,抽象的特征与单元的特征非常接近;整数变量就是一个例子,它通常直接用一个或多个字节的内存表示。在其他情况下,抽象与硬件内存的组织相距甚远,例如三维数组,它需要软件映射函数来支持抽象。
Imperative programming languages are, to varying degrees, abstractions of the underlying von Neumann computer architecture. The architecture’s two primary components are its memory, which stores both instructions and data, and its processor, which provides operations for modifying the contents of the memory. The abstractions in a language for the memory cells of the machine are variables. In some cases, the characteristics of the abstractions are very close to the characteristics of the cells; an example of this is an integer variable, which is usually represented directly in one or more bytes of memory. In other cases, the abstractions are far removed from the organization of the hardware memory, as with a three-dimensional array, which requires a software mapping function to support the abstraction.
变量可以用一组属性或特性来描述,其中最重要的是类型,这是编程语言中的一个基本概念。设计一种语言的数据类型需要考虑各种问题。(第6章 讨论了数据类型。)其中最重要的是变量的作用域和生命周期。
A variable can be characterized by a collection of properties, or attributes, the most important of which is type, a fundamental concept in programming languages. Designing the data types of a language requires that a variety of issues be considered. (Data types are discussed in Chapter 6.) Among the most important of these issues are the scope and lifetime of variables.
函数式编程语言允许命名表达式。这些命名表达式看起来像命令式语言中对变量名的赋值,但本质上不同,因为它们无法更改。因此,它们就像命令式语言的命名常量。纯函数式语言没有像命令式语言那样的变量。但是,许多函数式语言确实包含这样的变量。
Functional programming languages allow expressions to be named. These named expressions appear like assignments to variable names in imperative languages, but are fundamentally different in that they cannot be changed. So, they are like the named constants of the imperative languages. Pure functional languages do not have variables that are like those of the imperative languages. However, many functional languages do include such variables.
在本书的其余部分,语言家族通常被当做单一语言来提及。例如,Fortran 指的是 Fortran 的所有版本。Ada 也是如此。同样,对 C 的引用将意味着 C 的原始版本以及 C89 和 C99。当命名一种语言的特定版本时,这是因为它与所讨论主题中的其他家族成员不同。如果我们添加一个加号 对于语言版本的名称,我们指的是以该名称开头的所有语言版本。例如,Fortran 表示 Fortran 95 之后的所有版本。短语“基于 C 的语言”将用于指代 C、Objective-C、C++、Java 和 C# 。1
In the remainder of this book, families of languages will often be referred to as if they were single languages. For example, Fortran will mean all of the versions of Fortran. This is also the case for Ada. Likewise, a reference to C will mean the original version of C, as well as C89 and C99. When a specific version of a language is named, it is because it is different from the other family members within the topic being discussed. If we add a plus sign to the name of a version of a language, we mean all versions of the language beginning with the one named. For example, Fortran means all versions of Fortran beginning with Fortran 95. The phrase C-based languages will be used to refer to C, Objective-C, C++, Java, and C#.1
在开始讨论变量之前,必须先介绍变量的基本属性之一——名称的设计。名称还与子程序、形式参数和其他程序结构相关。术语“标识符”通常与“名称”互换使用。
Before beginning our discussion of variables, the design of one of the fundamental attributes of variables, names, must be covered. Names are also associated with subprograms, formal parameters, and other program constructs. The term identifier is often used interchangeably with name.
以下是名称的主要设计问题:
The following are the primary design issues for names:
名字区分大小写吗?
Are names case sensitive?
该语言的特殊字是保留字还是关键字?
Are the special words of the language reserved words or keywords?
以下两小节讨论了这些问题,其中还包括几种设计选择的示例。
These issues are discussed in the following two subsections, which also include examples of several design choices.
名称是用于标识程序中某个实体的一串字符。
A name is a string of characters used to identify some entity in a program.
C99 对其内部名称没有长度限制,但只有前 63 个字符有效。C99 中的外部名称(在函数外部定义的名称,必须由链接器处理)限制为 31 个字符。Java 和 C# 中的名称没有长度限制,其中的所有字符都有效。C++ 不指定名称的长度限制,但实现者有时会这样做。
C99 has no length limitation on its internal names, but only the first 63 are significant. External names in C99 (those defined outside functions, which must be handled by the linker) are restricted to 31 characters. Names in Java and C# have no length limit, and all characters in them are significant. C++ does not specify a length limit on names, although implementors sometimes do.
大多数编程语言中的名称都具有相同的形式:一个字母,后跟一个由字母、数字和下划线字符组成的字符串 ( )。尽管在 20 世纪 70 年代和 80 年代,下划线字符被广泛用于形成名称_,但这种做法现在已不那么流行了。在基于 C 的语言中,它在很大程度上已被所谓的驼峰表示法所取代,其中多词名称中除第一个单词外的所有单词都大写,例如myStack。2请注意,名称中使用下划线和大小写混合是编程风格问题,而不是语言设计问题。
Names in most programming languages have the same form: a letter followed by a string consisting of letters, digits, and underscore characters ( _ ). Although the use of underscore characters to form names was widely used in the 1970s and 1980s, that practice is now far less popular. In the C-based languages, it has to a large extent been replaced by the so-called camel notation, in which all of the words of a multiple-word name except the first are capitalized, as in myStack.2 Note that the use of underscores and mixed case in names is a programming style issue, not a language design issue.
最早的编程语言使用单字符名称。这种表示法很自然,因为早期的编程主要是数学的,数学家长期以来一直在其正式符号中使用单字符名称来表示未知数。
The earliest programming languages used single-character names. This notation was natural because early programming was primarily mathematical, and mathematicians have long used single-character names for unknowns in their formal notations.
Fortran I 打破了单字符名称的传统,允许名称最多包含六个字符。
Fortran I broke with the tradition of the single-character name, allowing up to six characters in its names.
PHP 中的所有变量名都必须以美元符号开头。在 Perl 中,变量名开头的特殊字符 、$或@指定%其类型(尽管与其他语言的含义不同)。在 Ruby 中,变量名开头的特殊字符@或@@分别表示该变量是实例变量还是类变量。
All variable names in PHP must begin with a dollar sign. In Perl, the special character at the beginning of a variable’s name, $, @, or %, specifies its type (although in a different sense than in other languages). In Ruby, special characters at the beginning of a variable’s name, @ or @@, indicate that the variable is an instance or a class variable, respectively.
在许多语言中,尤其是基于 C 的语言,名称中的大小写字母是不同的;也就是说,这些语言中的名称区分大小写。例如,以下三个名称在 C++ 中是不同的:rose、ROSE和Rose。对某些人来说,这严重损害了可读性,因为看起来非常相似的名称实际上表示不同的实体。从这个意义上说,区分大小写违反了看起来相似的语言结构应该具有相似含义的设计原则。但是在变量名区分大小写的语言中,虽然Rose和rose看起来相似,但它们之间没有任何联系。
In many languages, notably the C-based languages, uppercase and lowercase letters in names are distinct; that is, names in these languages are case sensitive. For example, the following three names are distinct in C++: rose, ROSE, and Rose. To some people, this is a serious detriment to readability, because names that look very similar in fact denote different entities. In that sense, case sensitivity violates the design principle that language constructs that look similar should have similar meanings. But in languages whose variable names are case-sensitive, although Rose and rose look similar, there is no connection between them.
显然,并非所有人都认为区分大小写对名称不利。在 C 语言中,变量名不包含大写字母的惯例可以避免区分大小写的问题。然而,在 Java 和 C# 中,这个问题无法避免,因为许多预定义名称都包含大写和小写字母。例如,Java 将字符串转换为整数值的方法是,并且无法识别和parseInt等拼写。这是一个可写性问题,而不是可读性问题,因为需要记住具体的大小写用法使得编写正确的程序变得更加困难。这是语言设计者的一种不宽容,由编译器强制执行。ParseIntparseint
Obviously, not everyone agrees that case sensitivity is bad for names. In C, the problems of case sensitivity are avoided by the convention that variable names do not include uppercase letters. In Java and C#, however, the problem cannot be escaped because many of the predefined names include both uppercase and lowercase letters. For example, the Java method for converting a string to an integer value is parseInt, and spellings such as ParseInt and parseint are not recognized. This is a problem of writability rather than readability, because the need to remember specific case usage makes it more difficult to write correct programs. It is a kind of intolerance on the part of the language designer, which is enforced by the compiler.
编程语言中的特殊字用于命名要执行的操作,从而使程序更具可读性。它们还用于分隔语句和程序的语法部分。在大多数语言中,特殊字被归类为保留字,这意味着程序员无法重新定义它们。但在一些语言中,例如 Fortran,它们只是关键字,这意味着它们可以重新定义。
Special words in programming languages are used to make programs more readable by naming actions to be performed. They also are used to separate the syntactic parts of statements and programs. In most languages, special words are classified as reserved words, which means they cannot be redefined by programmers. But in some, such as Fortran, they are only keywords, which means they can be redefined.
保留字是编程语言中不能用作名称的特殊字。保留字有一个潜在问题:如果语言包含大量保留字,用户可能难以编造非保留的名称。最好的例子是 COBOL,它有 300 个保留字。不幸的是,程序员最常选择的一些名称都在保留字列表中 —例如LENGTH,、、和。BOTTOMDESTINATIONCOUNT
A reserved word is a special word of a programming language that cannot be used as a name. There is one potential problem with reserved words: If the language includes a large number of reserved words, the user may have difficulty making up names that are not reserved. The best example of this is COBOL, which has 300 reserved words. Unfortunately, some of the most commonly chosen names by programmers are in the list of reserved words—for example, LENGTH, BOTTOM, DESTINATION, and COUNT.
在本书的程序代码示例中,保留字以粗体表示。
In program code examples in this book, reserved words are presented in boldface.
在大多数语言中,在其他程序单元(例如 Java 包以及 C 和 C++ 库)中定义的名称可以对程序可见。这些名称是预定义的,但只有在明确导入时才可见。一旦导入,它们就无法重新定义。
In most languages, names that are defined in other program units, such as Java packages and C and C++ libraries, can be made visible to a program. These names are predefined, but visible only if explicitly imported. Once imported, they cannot be redefined.
程序变量是计算机内存单元或单元集合的抽象。程序员通常将变量视为内存位置的名称,但变量的意义远不止名称。
A program variable is an abstraction of a computer memory cell or collection of cells. Programmers often think of variables as names for memory locations, but there is much more to a variable than just a name.
从机器语言到汇编语言的转变主要是用名称取代数据的绝对数字内存地址,使程序的可读性大大提高,因此也更易于编写和维护。汇编语言还提供了一种摆脱手动绝对寻址问题的办法,因为将名称转换为实际地址的翻译器也会选择这些地址。
The move from machine languages to assembly languages was largely one of replacing absolute numeric memory addresses for data with names, making programs far more readable and therefore easier to write and maintain. Assembly languages also provided an escape from the problem of manual absolute addressing, because the translator that converted the names to actual addresses also chose those addresses.
变量可以描述为六元组属性:(名称、地址、值、类型、生存期和作用域)。虽然对于如此简单的概念来说,这似乎过于复杂,但它提供了最清晰的方式来解释变量的各个方面。
A variable can be characterized as a sextuple of attributes: (name, address, value, type, lifetime, and scope). Although this may seem too complicated for such an apparently simple concept, it provides the clearest way to explain the various aspects of variables.
我们对变量属性的讨论将引发对别名、绑定、绑定时间、声明、作用域规则和引用环境等重要相关概念的考察。
Our discussion of variable attributes will lead to examinations of the important related concepts of aliases, binding, binding times, declarations, scoping rules, and referencing environments.
变量的名称、地址、类型和值属性将在以下小节中讨论。生存期和作用域属性分别在第 5.4.3节和第 5.5节 中讨论。
The name, address, type, and value attributes of variables are discussed in the following subsections. The lifetime and scope attributes are discussed in Sections 5.4.3 and 5.5, respectively.
变量名是程序中最常见的名称。第5.2节 在程序中的实体名称的一般上下文中详细讨论了变量名。大多数变量都有名称。第5.4.3.3节 讨论了没有名称的变量。
Variable names are the most common names in programs. They were discussed at length in Section 5.2 in the general context of entity names in programs. Most variables have names. The ones that do not are discussed in Section 5.4.3.3.
变量的地址是与其关联的机器内存地址。这种关联并不像看上去那么简单。在许多语言中,同一个变量在程序执行期间的不同时间可能与不同的地址相关联。例如,如果子程序有一个局部变量,该变量在调用子程序时从运行时堆栈分配,则不同的调用可能会导致该变量具有不同的地址。从某种意义上说,它们是同一个变量的不同实例。
The address of a variable is the machine memory address with which it is associated. This association is not as simple as it may appear. In many languages, it is possible for the same variable to be associated with different addresses at different times during the execution of the program. For example, if a subprogram has a local variable that is allocated from the run-time stack when the subprogram is called, different calls may result in that variable having different addresses. These are in a sense different instantiations of the same variable.
5.4.3节 进一步讨论了将变量与地址关联的过程。第10章 讨论了子程序及其激活的实现模型。
The process of associating variables with addresses is further discussed in Section 5.4.3. An implementation model for subprograms and their activations is discussed in Chapter 10.
变量的地址有时被称为它的左值,因为当变量名称出现在赋值语句的左侧时,所需要的就是地址。
The address of a variable is sometimes called its l-value, because the address is what is required when the name of a variable appears in the left side of an assignment.
可以有多个变量具有相同的地址。当可以使用多个变量名访问同一内存位置时,这些变量称为别名。别名会影响可读性,因为它允许通过赋值给不同的变量来更改变量的值。例如,如果名为total和的变量sum是别名,则对值的任何更改total也会更改值sum,反之亦然。程序的读者必须始终记住total和sum是不同的名称同一存储单元的别名数量可能不尽相同。由于程序中可以有任意数量的别名,因此在实践中这非常困难。别名也使程序验证更加困难。
It is possible to have multiple variables that have the same address. When more than one variable name can be used to access the same memory location, the variables are called aliases. Aliasing is a hindrance to readability because it allows a variable to have its value changed by an assignment to a different variable. For example, if variables named total and sum are aliases, any change to the value of total also changes the value of sum and vice versa. A reader of the program must always remember that total and sum are different names for the same memory cell. Because there can be any number of aliases in a program, this can be very difficult in practice. Aliasing also makes program verification more difficult.
程序中可以使用多种不同的方式创建别名。C 和 C++ 中一种常见的方法是使用联合类型。第6章 将详细讨论联合。
Aliases can be created in programs in several different ways. One common way in C and C++ is with their union types. Unions are discussed at length in Chapter 6.
当两个指针变量指向相同的内存位置时,它们就是别名。引用变量也是如此。这种别名只是指针和引用性质的副作用。当将 C++ 指针设置为指向命名变量时,取消引用后的指针和变量的名称就是别名。
Two pointer variables are aliases when they point to the same memory location. The same is true for reference variables. This kind of aliasing is simply a side effect of the nature of pointers and references. When a C++ pointer is set to point at a named variable, the pointer, when dereferenced, and the variable’s name are aliases.
很多语言都可以通过子程序参数创建别名。第9章 将讨论这些类型的别名。
Aliasing can be created in many languages through subprogram parameters. These kinds of aliases are discussed in Chapter 9.
变量与地址关联的时间对于理解编程语言非常重要。本主题在第5.4.3节 中讨论。
The time when a variable becomes associated with an address is very important to an understanding of programming languages. This subject is discussed in Section 5.4.3.
变量的类型决定了变量可以存储的值的范围以及为该类型的值定义的操作集。例如,intJava 中的类型指定了值的范围
到 2147483647 以及加、减、乘、除和模数的算术运算。
The type of a variable determines the range of values the variable can store and the set of operations that are defined for values of the type. For example, the int type in Java specifies a value range of
to 2147483647 and arithmetic operations for addition, subtraction, multiplication, division, and modulus.
变量的值是与该变量关联的存储单元的内容。将计算机内存视为抽象单元而不是物理单元更为方便。大多数现代计算机内存的物理单元或可单独寻址的单元都是八位单元,称为字节。字节大小对于大多数程序变量来说太小。抽象存储单元的大小由与其关联的变量所需。例如,尽管浮点值在特定语言的特定实现中可能占用四个物理字节,但浮点值被认为占用单个抽象存储单元。每个简单非结构化类型的值都被认为占用单个抽象单元。从今以后,术语“存储单元”将表示抽象存储单元。
The value of a variable is the contents of the memory cell or cells associated with the variable. It is convenient to think of computer memory in terms of abstract cells, rather than physical cells. The physical cells, or individually addressable units, of most contemporary computer memories are eight-bit units called bytes. A byte size is too small for most program variables. An abstract memory cell has the size required by the variable with which it is associated. For example, although floating-point values may occupy four physical bytes in a particular implementation of a particular language, a floating-point value is thought of as occupying a single abstract memory cell. The value of each simple nonstructured type is considered to occupy a single abstract cell. Henceforth, the term memory cell will mean abstract memory cell.
变量的值有时被称为右值,因为当变量名称出现在赋值语句的右侧时,需要右值。要访问右值,必须先确定左值。这种确定并不总是简单的。例如,范围规则可能会使事情变得非常复杂,如第5.5节 所述。
A variable’s value is sometimes called its r-value because it is what is required when the name of the variable appears in the right side of an assignment statement. To access the r-value, the l-value must be determined first. Such determinations are not always simple. For example, scoping rules can greatly complicate matters, as is discussed in Section 5.5.
绑定是属性与实体之间的关联,例如变量与其类型或值之间的关联,或操作与符号之间的关联。绑定发生的时间称为绑定时。绑定和绑定时间是编程语言语义中的重要概念。绑定可以在语言设计时、语言实现时、编译时、加载时、链接时或运行时进行。例如,星号 ( *) 通常在语言设计时绑定到乘法运算。数据类型(例如intC 语言中的数据类型)在语言实现时绑定到一系列可能的值。在编译时,Java 程序中的变量绑定到特定的数据类型。当程序加载到内存中时,变量可能会绑定到存储单元。在某些情况下,相同的绑定直到运行时才会发生,例如在 Java 方法中声明的变量。对库子程序的调用在链接时绑定到子程序代码。
A binding is an association between an attribute and an entity, such as between a variable and its type or value, or between an operation and a symbol. The time at which a binding takes place is called binding time. Binding and binding times are prominent concepts in the semantics of programming languages. Bindings can take place at language design time, language implementation time, compile time, load time, link time, or run time. For example, the asterisk symbol (*) is usually bound to the multiplication operation at language design time. A data type, such as int in C, is bound to a range of possible values at language implementation time. At compile time, a variable in a Java program is bound to a particular data type. A variable may be bound to a storage cell when the program is loaded into memory. That same binding does not happen until run time in some cases, as with variables declared in Java methods. A call to a library subprogram is bound to the subprogram code at link time.
考虑以下 C++ 赋值语句:
Consider the following C++ assignment statement:
count = count + 5;count = count + 5;
该赋值语句各部分的一些绑定及其绑定时间如下:
Some of the bindings and their binding times for the parts of this assignment statement are as follows:
的类型count在编译时被绑定。
The type of count is bound at compile time.
的可能值集合count是在编译器设计时绑定的。
The set of possible values of count is bound at compiler design time.
运算符号的含义 在编译时绑定,此时其操作数的类型已经确定。
The meaning of the operator symbol is bound at compile time, when the types of its operands have been determined.
文字的内部表示5在编译器设计时被绑定。
The internal representation of the literal 5 is bound at compiler design time.
的值count在执行时与该语句绑定。
The value of count is bound at execution time with this statement.
全面了解程序实体属性的绑定时间是理解编程语言语义的先决条件。例如,要了解子程序的作用,必须了解调用中的实际参数如何与其定义中的形式参数绑定。要确定变量的当前值,可能需要知道变量何时与存储绑定以及与哪条语句绑定。
A complete understanding of the binding times for the attributes of program entities is a prerequisite for understanding the semantics of a programming language. For example, to understand what a subprogram does, one must understand how the actual parameters in a call are bound to the formal parameters in its definition. To determine the current value of a variable, it may be necessary to know when the variable was bound to storage and with which statement or statements.
如果绑定在运行开始前首次发生,并且在整个程序执行过程中保持不变,则该绑定是静态的。如果绑定在运行期间首次发生,或者在程序执行过程中可以发生变化,则称为动态的。在虚拟内存环境中,变量与存储单元的物理绑定很复杂,因为单元所在的地址空间的页面或段可能在程序执行期间多次移入和移出内存。从某种意义上说,这样的变量被反复绑定和解除绑定。这些然而,绑定是由计算机硬件维护的,程序和用户看不到这些变化。由于这些硬件绑定对本文的讨论并不重要,因此我们不关心它们。重点是区分静态绑定和动态绑定。
A binding is static if it first occurs before run time begins and remains unchanged throughout program execution. If the binding first occurs during run time or can change in the course of program execution, it is called dynamic. The physical binding of a variable to a storage cell in a virtual memory environment is complex, because the page or segment of the address space in which the cell resides may be moved in and out of memory many times during program execution. In a sense, such variables are bound and unbound repeatedly. These bindings, however, are maintained by computer hardware, and the changes are invisible to the program and the user. Because they are not important to the discussion, we are not concerned with these hardware bindings. The essential point is to distinguish between static and dynamic bindings.
在程序中引用变量之前,必须先将其绑定到数据类型。此绑定的两个重要方面是如何指定类型以及绑定发生的时间。可以通过某种形式的显式或隐式声明静态指定类型。
Before a variable can be referenced in a program, it must be bound to a data type. The two important aspects of this binding are how the type is specified and when the binding takes place. Types can be specified statically through some form of explicit or implicit declaration.
显式声明是程序中的一条语句,它列出变量名称并指定它们属于特定类型。隐式声明是一种通过默认约定(而不是声明语句)将变量与类型关联起来的方法。在这种情况下,变量名称在程序中的第一次出现构成了其隐式声明。显式声明和隐式声明都会创建与类型的静态绑定。
An explicit declaration is a statement in a program that lists variable names and specifies that they are a particular type. An implicit declaration is a means of associating variables with types through default conventions, rather than declaration statements. In this case, the first appearance of a variable name in a program constitutes its implicit declaration. Both explicit and implicit declarations create static bindings to types.
自 20 世纪 60 年代中期设计以来,大多数广泛使用的编程语言都使用静态类型绑定,要求明确声明所有变量(Visual Basic、ML、C# 和 Swift 是一些例外)。
Most widely used programming languages that use static type binding exclusively and were designed since the mid-1960s require explicit declarations of all variables (Visual Basic, ML, C#, and Swift are some exceptions).
隐式变量类型绑定由语言处理器(编译器或解释器)完成。隐式变量类型绑定有几种不同的基础。其中最简单的是命名约定。在这种情况下,编译器或解释器根据变量名称的语法形式将变量绑定到类型。
Implicit variable type binding is done by the language processor, either a compiler or an interpreter. There are several different bases for implicit variable type bindings. The simplest of these is naming conventions. In this case, the compiler or interpreter binds a variable to a type based on the syntactic form of the variable’s name.
尽管隐式声明给程序员带来了一点便利,但它可能会损害可靠性,因为它们会阻止编译过程检测一些印刷错误和程序员错误。
Although they are a minor convenience to programmers, implicit declarations can be detrimental to reliability because they prevent the compilation process from detecting some typographical and programmer errors.
通过要求特定类型的名称以特定的特殊字符开头,可以避免隐式声明的一些问题。例如,在 Perl 中,任何以 开头的名称$都是标量,它可以存储字符串或数值。如果名称以 开头@,则它是一个数组;如果以 开头%,则它是一个哈希结构。3这会为不同类型的变量创建不同的命名空间。在这种情况下,名称@apple和%apple是不相关的,因为它们都来自不同的命名空间。此外,程序读取器在读取变量名称时始终知道变量的类型。
Some of the problems with implicit declarations can be avoided by requiring names for specific types to begin with particular special characters. For example, in Perl any name that begins with $ is a scalar, which can store either a string or a numeric value. If a name begins with @, it is an array; if it begins with a %, it is a hash structure.3 This creates different namespaces for different type variables. In this scenario, the names @apple and %apple are unrelated, because each is from a different namespace. Furthermore, a program reader always knows the type of a variable when reading its name.
另一种隐式类型声明使用上下文。有时这被称为类型推断。在更简单的情况下,上下文是声明语句中分配给变量的值的类型。例如,在 C# 中,var变量的声明必须包含初始值,其类型被视为变量的类型。考虑以下声明:
Another kind of implicit type declarations uses context. This is sometimes called type inference. In the simpler case, the context is the type of the value assigned to the variable in a declaration statement. For example, in C# a var declaration of a variable must include an initial value, whose type is taken as the type of the variable. Consider the following declarations:
var sum = 0;
var total = 0.0;
var name = "Fred";
var sum = 0;
var total = 0.0;
var name = "Fred";
sum、total和的类型分别name为int、float和string。请记住,这些是静态类型变量 — 它们的类型在声明它们的单元的生命周期内是固定的。
The types of sum, total, and name are int, float, and string, respectively. Keep in mind that these are statically typed variables—their types are fixed for the lifetime of the unit in which they are declared.
Visual Basic、Swift 以及函数式语言 ML、Haskell、OCaml 和 F# 也使用类型推断。
Visual Basic, Swift, and the functional languages ML, Haskell, OCaml, and F# also use type inferencing.
使用动态类型绑定,变量的类型不由声明语句指定,也不能通过其名称的拼写确定。相反,当在赋值语句中为变量分配值时,该变量将绑定到某种类型。执行赋值语句时,被分配的变量将绑定到赋值语句右侧表达式的值的类型。这样的赋值还可以将变量绑定到地址和存储单元,因为不同类型的值可能需要不同的存储空间。任何变量都可以被分配任意类型的值。此外,变量的类型在程序执行期间可以更改任意次数。重要的是要认识到,类型动态绑定的变量的类型可能是临时的。
With dynamic type binding, the type of a variable is not specified by a declaration statement, nor can it be determined by the spelling of its name. Instead, the variable is bound to a type when it is assigned a value in an assignment statement. When the assignment statement is executed, the variable being assigned is bound to the type of the value of the expression on the right side of the assignment. Such an assignment may also bind the variable to an address and a memory cell, because different type values may require different amounts of storage. Any variable can be assigned any type value. Furthermore, a variable’s type can change any number of times during program execution. It is important to realize that the type of a variable whose type is dynamically bound may be temporary.
当变量的类型是静态绑定时,变量的名称可以被认为绑定到类型,即变量的类型和名称同时绑定。但是,当变量的类型是动态绑定时,可以认为其名称只是暂时绑定到类型。实际上,变量的名称永远不会绑定到类型。名称可以绑定到变量,变量也可以绑定到类型。
When the type of a variable is statically bound, the name of the variable can be thought of being bound to a type, in the sense that the type and name of a variable are simultaneously bound. However, when a variable’s type is dynamically bound, its name can be thought of as being only temporarily bound to a type. In reality, the names of variables are never bound to types. Names can be bound to variables and variables can be bound to types.
类型动态绑定的语言与类型静态绑定的语言截然不同。将变量动态绑定到类型的主要优点是它提供了更大的编程灵活性。例如,使用动态类型绑定的语言中处理数值数据的程序可以编写为通用程序,这意味着它能够处理任何数值类型的数据。无论输入什么类型的数据都是可以接受的,因为在输入后将数据分配给变量时,可以将要存储数据的变量绑定到正确的类型。相比之下,由于类型的静态绑定,如果不知道数据的类型,就无法编写 C 程序来处理数据。
Languages in which types are dynamically bound are dramatically different from those in which types are statically bound. The primary advantage of dynamic binding of variables to types is that it provides more programming flexibility. For example, a program to process numeric data in a language that uses dynamic type binding can be written as a generic program, meaning that it is capable of dealing with data of any numeric type. Whatever type data is input will be acceptable, because the variables in which the data are to be stored can be bound to the correct type when the data is assigned to the variables after input. By contrast, because of static binding of types, one cannot write a C program to process data without knowing the type of that data.
在 20 世纪 90 年代中期之前,最常用的编程语言使用静态类型绑定,主要的例外是一些函数式语言,例如 Lisp。然而,从那时起,语言已经明显转向使用动态类型绑定。在 Python、Ruby、JavaScript 和 PHP 中,类型绑定是动态的。例如,JavaScript 脚本可能包含以下语句:
Before the mid-1990s, the most commonly used programming languages used static type binding, the primary exceptions being some functional languages such as Lisp. However, since then there has been a significant shift to languages that use dynamic type binding. In Python, Ruby, JavaScript, and PHP, type binding is dynamic. For example, a JavaScript script may contain the following statement:
list = [10.2, 3.5];list = [10.2, 3.5];
无论名为 的变量先前的类型如何list,此赋值都会使其成为长度为 2 的一维数组的名称。如果语句
Regardless of the previous type of the variable named list, this assignment causes it to become the name of a single-dimensioned array of length 2. If the statement
list = 47;list = 47;
按照前面的示例赋值,list将成为标量变量的名称。
followed the previous example assignment, list would become the name of a scalar variable.
动态类型绑定选项已包含在 C# 2010 中。可以通过dynamic在变量声明中包含保留字来声明变量以使用动态类型绑定,如下例所示:
The option of dynamic type binding was included in C# 2010. A variable can be declared to use dynamic type binding by including the dynamic reserved word in its declaration, as in the following example:
dynamic any;dynamic any;
any这与声明具有类型类似,但也不同object。 类似之处在于any可以分配任何类型的值,就像声明一样object。 不同之处在于它对于几种不同的互操作情况没有用处;例如,使用动态类型语言,如 IronPython 和 IronRuby(分别是 Python 和 Ruby 的 .NET 版本)。 但是,当未知类型的数据从外部源进入程序时,它很有用。 类成员、属性、方法参数、方法返回值和局部变量都可以声明dynamic。
This is similar, although also different from declaring any to have type object. It is similar in that any can be assigned a value of any type, just as if it were declared object. It is different in that it is not useful for several different situations of interoperation; for example, with dynamically typed languages such as IronPython and IronRuby (.NET versions of Python and Ruby, respectively). However, it is useful when data of unknown type come into a program from an external source. Class members, properties, method parameters, method return values, and local variables can all be declared dynamic.
在纯面向对象语言(例如 Ruby)中,所有变量都是引用,没有类型;所有数据都是对象,任何变量都可以引用任何对象。从某种意义上说,此类语言中的变量都是同一类型 - 它们是引用。但是,与 Java 中的引用(仅限于引用一种特定类型的值)不同,Ruby 中的变量可以引用任何对象。
In pure object-oriented languages—for example, Ruby—all variables are references and do not have types; all data are objects and any variable can reference any object. Variables in such languages are, in a sense, all the same type—they are references. However, unlike the references in Java, which are restricted to referencing one specific type of value, variables in Ruby can reference any object.
动态类型绑定有两个缺点。首先,它会导致程序不太可靠,因为与具有静态类型绑定的语言的编译器相比,编译器的错误检测能力有所下降。动态类型绑定允许为任何变量分配任何类型的值。赋值语句右侧的错误类型不会被检测为错误;相反,左侧的类型只会更改为不正确的类型。例如,假设在特定的 JavaScript 程序中,i和x当前是标量数字变量的名称,而y当前是数组的名称。此外,假设程序需要赋值语句
There are two disadvantages to dynamic type binding. First, it causes programs to be less reliable, because the error-detection capability of the compiler is diminished relative to a compiler for a language with static type bindings. Dynamic type binding allows any variable to be assigned a value of any type. Incorrect types of right sides of assignments are not detected as errors; rather, the type of the left side is simply changed to the incorrect type. For example, suppose that in a particular JavaScript program, i and x are currently the names of scalar numeric variables and y is currently the name of an array. Furthermore, suppose that the program needs the assignment statement
i = x;i = x;
但由于键入错误,它有赋值语句
but because of a keying error, it has the assignment statement
i = y;i = y;
在 JavaScript(或任何其他使用动态类型绑定的语言)中,解释器不会检测到此语句中的错误 - 命名变量的类型i只是更改为数组。但以后使用时i会认为它是一个标量,因此不可能得到正确的结果。在具有静态类型绑定的语言中,例如 Java,编译器会检测到赋值中的错误i = y,程序就无法执行。
In JavaScript (or any other language that uses dynamic type binding), no error is detected in this statement by the interpreter—the type of the variable named i is simply changed to an array. But later uses of i will expect it to be a scalar, and correct results will be impossible. In a language with static type binding, such as Java, the compiler would detect the error in the assignment i = y, and the program would not get to execution.
请注意,这种缺点在某些使用静态类型绑定的语言中也存在一定程度,例如 C 和 C++,它们在许多情况下会自动将赋值的 RHS 类型转换为 LHS 的类型。
Note that this disadvantage is also present to some extent in some languages that use static type binding, such as C and C++, which in many cases automatically convert the type of the RHS of an assignment to the type of the LHS.
动态类型绑定的最大缺点可能是成本。实现动态属性绑定的成本相当高,尤其是在执行时间方面。必须在运行时进行类型检查。此外,每个变量都必须有一个与之关联的运行时描述符来维护当前类型。用于变量值的存储必须具有不同的大小,因为不同类型的值需要不同的存储量。
Perhaps the greatest disadvantage of dynamic type binding is cost. The cost of implementing dynamic attribute binding is considerable, particularly in execution time. Type checking must be done at run time. Furthermore, every variable must have a run-time descriptor associated with it to maintain the current type. The storage used for the value of a variable must be of varying size, because different type values require different amounts of storage.
最后,具有变量动态类型绑定的语言通常是使用纯解释器而不是编译器来实现的。计算机没有在编译时不知道操作数类型的指令。因此,如果在编译时不知道 a 和 b 的类型,编译器就无法为表达式 a + b 构建机器指令。纯解释通常需要至少 10 倍于执行等效机器代码的时间。当然,如果一种语言是用纯解释器实现的,那么执行动态类型绑定的时间就会被解释的总时间所隐藏,因此在这种环境下似乎成本更低。另一方面,具有静态类型绑定的语言很少通过纯解释来实现,因为这些语言中的程序可以轻松转换为非常高效的机器代码版本。
Finally, languages that have dynamic type binding for variables are usually implemented using pure interpreters rather than compilers. Computers do not have instructions whose operand types are not known at compile time. Therefore, a compiler cannot build machine instructions for the expression a + b if the types of a and b are not known at compile time. Pure interpretation typically takes at least 10 times as long as it does to execute equivalent machine code. Of course, if a language is implemented with a pure interpreter, the time to do dynamic type binding is hidden by the overall time of interpretation, so it seems less costly in that environment. On the other hand, languages with static type bindings are seldom implemented by pure interpretation, because programs in these languages can be easily translated to very efficient machine code versions.
命令式编程语言的基本特性很大程度上取决于变量存储绑定的设计。因此,清晰地理解这些绑定非常重要。
The fundamental character of an imperative programming language is in large part determined by the design of the storage bindings for its variables. It is therefore important to have a clear understanding of these bindings.
变量绑定的内存单元必须以某种方式从可用内存池中取出。这个过程称为分配。释放是将已从变量解除绑定的内存单元放回可用内存池的过程。
The memory cell to which a variable is bound somehow must be taken from a pool of available memory. This process is called allocation. Deallocation is the process of placing a memory cell that has been unbound from a variable back into the pool of available memory.
变量的生命周期是变量绑定到特定内存位置的时间。因此,变量的生命周期从它绑定到特定单元时开始,到它从该单元解除绑定时结束。为了研究变量的存储绑定,根据变量的生命周期,将标量(非结构化)变量分为四类很方便。这些类别称为静态、堆栈动态、显式堆动态和隐式堆动态。在以下部分中,我们将讨论这四个类别的定义,以及它们的用途、优点和缺点。
The lifetime of a variable is the time during which the variable is bound to a specific memory location. So, the lifetime of a variable begins when it is bound to a specific cell and ends when it is unbound from that cell. To investigate storage bindings of variables, it is convenient to separate scalar (unstructured) variables into four categories, according to their lifetimes. These categories are named static, stack-dynamic, explicit heap-dynamic, and implicit heap-dynamic. In the following sections, we discuss the definitions of these four categories, along with their purposes, advantages, and disadvantages.
静态变量是在程序执行开始之前绑定到内存单元的变量,并且直到程序执行终止之前一直绑定到同一个内存单元。静态绑定变量有几个有价值的编程中的应用。全局可访问变量通常在程序执行过程中使用,因此有必要在执行期间将它们绑定到同一存储。有时,使用对历史敏感的子程序会很方便。这样的子程序必须具有局部静态变量。
A static variable is one that is bound to a memory cell before program execution begins and remains bound to that same memory cell until program execution terminates. Statically bound variables have several valuable applications in programming. Globally accessible variables are often used throughout the execution of a program, thus making it necessary to have them bound to the same storage during that execution. Sometimes it is convenient to have subprograms that are history sensitive. Such a subprogram must have local static variables.
静态变量的一个优点是效率。所有静态变量的寻址都可以是直接的;其他类型的变量通常需要间接寻址,这更慢。此外,静态变量的分配和释放不会产生运行时开销,尽管这个时间通常可以忽略不计。
One advantage of static variables is efficiency. All addressing of static variables can be direct;4 other kinds of variables often require indirect addressing, which is slower. Also, no run-time overhead is incurred for allocation and deallocation of static variables, although this time is often negligible.
静态绑定到存储的一个缺点是灵活性降低;特别是,只有静态变量的语言无法支持递归子程序。另一个缺点是存储不能在变量之间共享。例如,假设一个程序有两个子程序,它们都需要大数组。此外,假设这两个子程序永远不会同时处于活动状态。如果数组是静态的,它们就不能共享相同的存储空间。
One disadvantage of static binding to storage is reduced flexibility; in particular, a language that has only static variables cannot support recursive subprograms. Another disadvantage is that storage cannot be shared among variables. For example, suppose a program has two subprograms, both of which require large arrays. Furthermore, suppose that the two subprograms are never active at the same time. If the arrays are static, they cannot share the same storage.
C 和 C++ 允许程序员static在函数中的变量定义中包含说明符,使其定义的变量成为静态变量。请注意,当修饰符static出现在 C++、Java 和 C# 中类定义中的变量声明中时,它还暗示该变量是类变量,而不是实例变量。类变量是在类首次实例化之前的某个时间静态创建的。
C and C++ allow programmers to include the static specifier on a variable definition in a function, making the variables it defines static. Note that when the static modifier appears in the declaration of a variable in a class definition in C++, Java, and C#, it also implies that the variable is a class variable, rather than an instance variable. Class variables are created statically some time before the class is first instantiated.
堆栈动态变量是指在阐述其声明语句时创建存储绑定但类型是静态绑定的变量。此类声明的阐述是指声明所指示的存储分配和绑定过程,该过程在执行到达声明所附加的代码时发生。因此,阐述发生在运行时。例如,在 Java 方法开头出现的变量声明在调用该方法时阐述,而由这些声明定义的变量在该方法完成执行时释放。
Stack-dynamic variables are those whose storage bindings are created when their declaration statements are elaborated, but whose types are statically bound. Elaboration of such a declaration refers to the storage allocation and binding process indicated by the declaration, which takes place when execution reaches the code to which the declaration is attached. Therefore, elaboration occurs during run time. For example, the variable declarations that appear at the beginning of a Java method are elaborated when the method is called and the variables defined by those declarations are deallocated when the method completes its execution.
顾名思义,堆栈动态变量是从运行时堆栈分配的。
As their name indicates, stack-dynamic variables are allocated from the run-time stack.
某些语言(例如 C++ 和 Java)允许在语句可以出现的任何位置进行变量声明。在这些语言的某些实现中,函数或方法中声明的所有堆栈动态变量(不包括在嵌套块中声明的变量)可能在函数或方法执行开始时绑定到存储,即使其中一些变量的声明并未出现在开头。在这种情况下,变量在声明时可见,但存储绑定(和初始化,如果在声明中指定)在函数或方法开始执行时发生。变量的存储绑定发生在其可见之前这一事实并不影响语言的语义。
Some languages—for example, C++ and Java—allow variable declarations to occur anywhere a statement can appear. In some implementations of these languages, all of the stack-dynamic variables declared in a function or method (not including those declared in nested blocks) may be bound to storage at the beginning of execution of the function or method, even though the declarations of some of these variables do not appear at the beginning. In such cases, the variable becomes visible at the declaration, but the storage binding (and initialization, if it is specified in the declaration) occurs when the function or method begins execution. The fact that storage binding of a variable takes place before it becomes visible does not affect the semantics of the language.
堆栈动态变量的优点如下:为了至少在大多数情况下有用,递归子程序需要某种形式的动态本地存储,以便递归子程序的每个活动副本都有自己的本地变量版本。堆栈动态变量可以方便地满足这些需求。即使在没有递归的情况下,为子程序提供堆栈动态本地存储也并非没有好处,因为所有子程序都为其本地变量共享相同的内存空间。
The advantages of stack-dynamic variables are as follows: To be useful, at least in most cases, recursive subprograms require some form of dynamic local storage so that each active copy of the recursive subprogram has its own version of the local variables. These needs are conveniently met by stack-dynamic variables. Even in the absence of recursion, having stack-dynamic local storage for subprograms is not without merit, because all subprograms share the same memory space for their locals.
相对于静态变量,堆栈动态变量的缺点是分配和释放的运行时开销,可能由于需要间接寻址而导致访问速度较慢,并且子程序不能对历史敏感。分配和释放堆栈动态变量所需的时间并不长,因为在子程序开头声明的所有堆栈动态变量都是一起分配和释放的,而不是通过单独的操作。
The disadvantages, relative to static variables, of stack-dynamic variables are the run-time overhead of allocation and deallocation, possibly slower accesses because indirect addressing is required, and the fact that subprograms cannot be history sensitive. The time required to allocate and deallocate stack-dynamic variables is not significant, because all of the stack-dynamic variables that are declared at the beginning of a subprogram are allocated and deallocated together, rather than by separate operations.
在 Java、C++ 和 C# 中,方法中定义的变量默认是堆栈动态的。
In Java, C++, and C#, variables defined in methods are by default stack dynamic.
除存储之外的所有属性都静态绑定到堆栈动态标量变量。对于某些结构化类型,情况并非如此,如第6章所述。第 10章 讨论了堆栈动态变量的分配/释放过程的实现。
All attributes other than storage are statically bound to stack-dynamic scalar variables. That is not the case for some structured types, as is discussed in Chapter 6. Implementation of allocation/deallocation processes for stack-dynamic variables is discussed in Chapter 10.
显式堆动态变量是无名(抽象)内存单元,由程序员编写的显式运行时指令分配和释放。这些从堆分配和释放到堆的变量只能通过指针或引用变量引用。堆是存储单元的集合,由于其用途不可预测,其组织非常混乱。用于访问显式堆动态变量的指针或引用变量的创建方式与任何其他标量变量相同。显式堆动态变量由运算符(例如,在 C++ 中)或为此目的提供的系统子程序调用(例如,在 C 中)创建。
Explicit heap-dynamic variables are nameless (abstract) memory cells that are allocated and deallocated by explicit run-time instructions written by the programmer. These variables, which are allocated from and deallocated to the heap, can only be referenced through pointer or reference variables. The heap is a collection of storage cells whose organization is highly disorganized due to the unpredictability of its use. The pointer or reference variable that is used to access an explicit heap-dynamic variable is created as any other scalar variable. An explicit heap-dynamic variable is created by either an operator (for example, in C++) or a call to a system subprogram provided for that purpose (for example, in C).
在 C++ 中,名为 的分配运算符new使用类型名称作为其操作数。执行时,将创建操作数类型的显式堆动态变量并返回其地址。由于显式堆动态变量在编译时绑定到类型,因此该绑定是静态的。但是,此类变量在创建时(即在运行时)绑定到存储。
In C++, the allocation operator, named new, uses a type name as its operand. When executed, an explicit heap-dynamic variable of the operand type is created and its address is returned. Because an explicit heap-dynamic variable is bound to a type at compile time, that binding is static. However, such variables are bound to storage at the time they are created, which is during run time.
除了用于创建显式堆动态变量的子程序或运算符之外,某些语言还包含用于显式销毁它们的子程序或运算符。
In addition to a subprogram or operator for creating explicit heap-dynamic variables, some languages include a subprogram or operator for explicitly destroying them.
作为显式堆动态变量的一个示例,请考虑以下 C++ 代码段:
As an example of explicit heap-dynamic variables, consider the following C++ code segment:
int *intnode; // Create a pointer
intnode = new int; // Create the heap-dynamic variable
. . .
delete intnode; // Deallocate the heap-dynamic variable
// to which intnode pointsint *intnode; // Create a pointer
intnode = new int; // Create the heap-dynamic variable
. . .
delete intnode; // Deallocate the heap-dynamic variable
// to which intnode points
int在此示例中,运算符创建了 类型的显式堆动态变量new。然后可以通过指针 引用此变量。intnode稍后,运算符将释放该变量delete。C++ 需要显式释放运算符delete,因为它不使用隐式存储回收,例如垃圾回收。
In this example, an explicit heap-dynamic variable of int type is created by the new operator. This variable can then be referenced through the pointer, intnode. Later, the variable is deallocated by the delete operator. C++ requires the explicit deallocation operator delete, because it does not use implicit storage reclamation, such as garbage collection.
在 Java 中,除原始标量之外的所有数据都是对象。Java 对象明确地是堆动态的,并通过引用变量访问。Java 无法明确销毁堆动态变量;而是使用隐式垃圾收集。第6章 讨论了垃圾收集。
In Java, all data except the primitive scalars are objects. Java objects are explicitly heap dynamic and are accessed through reference variables. Java has no way of explicitly destroying a heap-dynamic variable; rather, implicit garbage collection is used. Garbage collection is discussed in Chapter 6.
C# 具有显式堆动态对象和堆栈动态对象,所有这些对象都是隐式释放的。此外,C# 还支持 C++ 样式的指针。此类指针用于引用堆、堆栈甚至静态变量和对象。这些指针具有与 C++ 指针相同的危险性,并且它们在堆上引用的对象不会被隐式释放。C# 中包含指针是为了允许 C# 组件与 C 和 C++ 组件进行互操作。为了阻止使用它们,也为了让任何程序读者清楚地知道代码使用了指针,定义指针的任何方法的标头都必须包含保留字unsafe。
C# has both explicit heap-dynamic and stack-dynamic objects, all of which are implicitly deallocated. In addition, C# supports C++-style pointers. Such pointers are used to reference heap, stack, and even static variables and objects. These pointers have the same dangers as those of C++, and the objects they reference on the heap are not implicitly deallocated. Pointers are included in C# to allow C# components to interoperate with C and C++ components. To discourage their use, and also to make clear to any program reader that the code uses pointers, the header of any method that defines a pointer must include the reserved word unsafe.
显式堆动态变量通常用于构建需要在执行过程中增长和/或收缩的动态结构,例如链表和树。可以使用指针或引用和显式堆动态变量方便地构建此类结构。
Explicit heap-dynamic variables are often used to construct dynamic structures, such as linked lists and trees, that need to grow and/or shrink during execution. Such structures can be built conveniently using pointers or references and explicit heap-dynamic variables.
显式堆动态变量的缺点是难以正确使用指针和引用变量、引用变量的成本以及所需存储管理实现的复杂性。这本质上是堆管理的问题,它成本高昂且复杂。第6章 详细讨论了显式堆动态变量的实现方法。
The disadvantages of explicit heap-dynamic variables are the difficulty of using pointer and reference variables correctly, the cost of references to the variables, and the complexity of the required storage management implementation. This is essentially the problem of heap management, which is costly and complicated. Implementation methods for explicit heap-dynamic variables are discussed at length in Chapter 6.
隐式堆动态变量仅在赋值时才绑定到堆存储。实际上,每次赋值时都会绑定它们的所有属性。例如,考虑以下 JavaScript 赋值语句:
Implicit heap-dynamic variables are bound to heap storage only when they are assigned values. In fact, all their attributes are bound every time they are assigned. For example, consider the following JavaScript assignment statement:
highs = [74, 84, 86, 90, 71];highs = [74, 84, 86, 90, 71];
无论该变量highs之前是否在程序中使用过,也无论它用于什么用途,它现在都是一个包含五个数值的数组。
Regardless of whether the variable named highs was previously used in the program or what it was used for, it is now an array of five numeric values.
此类变量的优点是它们具有最高程度的灵活性,允许编写高度通用的代码。隐式堆动态变量的一个缺点是维护所有动态属性的运行时开销,这些属性可能包括数组下标类型和范围等。另一个缺点是编译器会失去一些错误检测能力,如第5.4.2.2节 所述。
The advantage of such variables is that they have the highest degree of flexibility, allowing highly generic code to be written. One disadvantage of implicit heap-dynamic variables is the run-time overhead of maintaining all the dynamic attributes, which could include array subscript types and ranges, among others. Another disadvantage is the loss of some error detection by the compiler, as discussed in Section 5.4.2.2.
理解变量的一个重要因素是作用域。变量的作用域是该变量可见的语句范围。如果变量可以在语句中引用或赋值,则该变量在该语句中可见。
One of the important factors in understanding variables is scope. The scope of a variable is the range of statements in which the variable is visible. A variable is visible in a statement if it can be referenced or assigned in that statement.
语言的作用域规则决定了名称的特定出现如何与变量相关联,或者在函数式语言的情况下,名称如何与表达式相关联。具体而言,作用域规则决定了对当前正在执行的子程序或块之外声明的变量的引用如何与其声明相关联,从而与其属性相关联(块将在第5.5.2节 中讨论)。因此,清楚地理解语言的这些规则对于用该语言编写或阅读程序的能力至关重要。
The scope rules of a language determine how a particular occurrence of a name is associated with a variable, or in the case of a functional language, how a name is associated with an expression. In particular, scope rules determine how references to variables declared outside the currently executing subprogram or block are associated with their declarations and thus their attributes (blocks are discussed in Section 5.5.2). A clear understanding of these rules for a language is therefore essential to the ability to write or read programs in that language.
如果变量在程序单元或块中声明,则该变量在程序单元或块中是局部的。程序单元或块的非局部变量是那些在程序单元或块中可见但未在程序单元或块中声明的变量。全局变量是非局部变量的一个特殊类别,将在5.5.4节 中讨论。
A variable is local in a program unit or block if it is declared there. The nonlocal variables of a program unit or block are those that are visible within the program unit or block but are not declared there. Global variables are a special category of nonlocal variables, which are discussed in Section 5.5.4.
Scoping issues of classes, packages, and namespaces are discussed in Chapter 11.
ALGOL 60 引入了将名称绑定到非局部变量的方法,称为静态作用域5,该方法已被许多后续命令式语言和许多非命令式语言复制。静态作用域之所以如此命名,是因为变量的作用域可以静态确定 - 即在执行之前。这允许人类程序读者(和编译器)通过检查其源代码来确定程序中每个变量的类型。
ALGOL 60 introduced the method of binding names to nonlocal variables called static scoping,5 which has been copied by many subsequent imperative languages and many nonimperative languages as well. Static scoping is so named because the scope of a variable can be statically determined—that is, prior to execution. This permits a human program reader (and a compiler) to determine the type of every variable in the program simply by examining its source code.
静态作用域语言有两种:一种是可以嵌套子程序的语言,这会创建嵌套的静态作用域;另一种是子程序不能嵌套的语言。在后一类中,静态作用域也可以由子程序创建,但嵌套作用域只能由嵌套的类定义和块创建。
There are two categories of static-scoped languages: those in which subprograms can be nested, which creates nested static scopes, and those in which subprograms cannot be nested. In the latter category, static scopes are also created by subprograms but nested scopes are created only by nested class definitions and blocks.
Ada、JavaScript、Common Lisp、Scheme、Fortran F# 和 Python 允许嵌套子程序,但基于 C 的语言不允许。
Ada, JavaScript, Common Lisp, Scheme, Fortran F#, and Python allow nested subprograms, but the C-based languages do not.
本节中我们对静态作用域的讨论主要针对那些允许嵌套子程序的语言。首先,我们假设所有作用域都与程序单元相关联,并且所有引用的非局部变量都在其他程序单元中声明。6在本章中,我们假设作用域是所讨论语言中访问非局部变量的唯一方法。但并非所有语言都是如此。甚至并非所有使用静态作用域的语言都是如此,但这一假设简化了此处的讨论。
Our discussion of static scoping in this section focuses on those languages that allow nested subprograms. Initially, we assume that all scopes are associated with program units and that all referenced nonlocal variables are declared in other program units.6 In this chapter, it is assumed that scoping is the only method of accessing nonlocal variables in the languages under discussion. This is not true for all languages. It is not even true for all languages that use static scoping, but the assumption simplifies the discussion here.
当程序的读者找到对变量的引用时,可以通过查找声明该变量的语句(显式或隐式)来确定变量的属性。在具有嵌套子程序的静态作用域语言中,可以按以下方式考虑此过程。假设对x子程序中的变量进行了引用sub1。首先搜索子程序的声明可以找到正确的声明sub1。如果在那里找不到该变量的声明,则继续在声明了子程序的子程序的声明中搜索sub1,该子程序被称为其静态父级x。如果在那里也找不到的声明,则继续搜索下一个更大的封闭单元(声明sub1的父级的单元),依此类推,直到x找到的声明或最大单元的声明搜索失败。在这种情况下,将报告未声明的变量错误。子程序的静态父级sub1及其静态父级等等,直到包括最大的封闭子程序,被称为的静态祖先sub1。静态作用域的实际实现技术(将在第10章 中讨论)通常比刚才描述的过程高效得多。
When the reader of a program finds a reference to a variable, the attributes of the variable can be determined by finding the statement in which it is declared (either explicitly or implicitly). In static-scoped languages with nested subprograms, this process can be thought of in the following way. Suppose a reference is made to a variable x in subprogram sub1. The correct declaration is found by first searching the declarations of subprogram sub1. If no declaration is found for the variable there, the search continues in the declarations of the subprogram that declared subprogram sub1, which is called its static parent. If a declaration of x is not found there, the search continues to the next-larger enclosing unit (the unit that declared sub1’s parent), and so forth, until a declaration for x is found or the largest unit’s declarations have been searched without success. In that case, an undeclared variable error is reported. The static parent of subprogram sub1, and its static parent, and so forth up to and including the largest enclosing subprogram, are called the static ancestors of sub1. Actual implementation techniques for static scoping, which are discussed in Chapter 10, are usually much more efficient than the process just described.
考虑以下 JavaScript 函数,big其中两个函数sub1和sub2是嵌套的:
Consider the following JavaScript function, big, in which the two functions sub1 and sub2 are nested:
function big() {
function sub1() {
var x = 7;
sub2();
}
function sub2() {
var y = x;
}
var x = 3;
sub1();
}function big() {
function sub1() {
var x = 7;
sub2();
}
function sub2() {
var y = x;
}
var x = 3;
sub1();
}
x在静态作用域下,对中的变量的引用sub2是对x在过程 中声明的 的引用big。这是正确的,因为 的搜索x从出现引用的过程 开始,sub2但x在那里找不到 的声明。搜索继续在 的静态父级 中sub2,在那里找到了big的声明。中的声明被忽略,因为它不在 的静态祖先中。xxsub1sub2
Under static scoping, the reference to the variable x in sub2 is to the x declared in the procedure big. This is true because the search for x begins in the procedure in which the reference occurs, sub2, but no declaration for x is found there. The search continues in the static parent of sub2, big, where the declaration of x is found. The x declared in sub1 is ignored, because it is not in the static ancestry of sub2.
在一些使用静态作用域的语言中,无论是否允许嵌套子程序,一些变量声明都可以对其他一些代码段隐藏。例如,再次考虑 JavaScript 函数big。变量在和 中x都声明,而 嵌套在 内。在 内,对 的每个简单引用都是对本地 的引用。因此,外部是隐藏的。bigsub1bigsub1xxxsub1
In some languages that use static scoping, regardless of whether nested subprograms are allowed, some variable declarations can be hidden from some other code segments. For example, consider again the JavaScript function big. The variable x is declared in both big and in sub1, which is nested inside big. Within sub1, every simple reference to x is to the local x. Therefore, the outer x is hidden from sub1.
许多语言允许在可执行代码中定义新的静态作用域。这个强大的概念是在 ALGOL 60 中引入的,它允许一段代码拥有自己的局部变量,这些变量的作用域被最小化。这些变量通常是堆栈动态的,因此它们的存储空间在进入该段时分配,在退出该段时释放。这样的一段代码称为块。块是短语“块结构语言”的起源。
Many languages allow new static scopes to be defined in the midst of executable code. This powerful concept, introduced in ALGOL 60, allows a section of code to have its own local variables whose scope is minimized. Such variables are typically stack dynamic, so their storage is allocated when the section is entered and deallocated when the section is exited. Such a section of code is called a block. Blocks provide the origin of the phrase block-structured language.
基于 C 的语言允许任何复合语句(由匹配括号括起来的语句序列)具有声明,从而定义新的作用域。此类复合语句称为块。例如,如果list是整数数组,则可以编写以下内容:
The C-based languages allow any compound statement (a statement sequence surrounded by matched braces) to have declarations and thereby define a new scope. Such compound statements are called blocks. For example, if list were an integer array, one could write the following:
if (list[i] < list[j]) {
int temp;
temp = list[i];
list[i] = list[j];
list[j] = temp;
}if (list[i] < list[j]) {
int temp;
temp = list[i];
list[i] = list[j];
list[j] = temp;
}
块创建的作用域可以嵌套在更大的块中,其处理方式与子程序创建的作用域完全相同。对块中未声明的变量的引用将通过按大小递增的顺序搜索封闭作用域(块或子程序)来连接到声明。
The scopes created by blocks, which could be nested in larger blocks, are treated exactly like those created by subprograms. References to variables in a block that are not declared there are connected to declarations by searching enclosing scopes (blocks or subprograms) in order of increasing size.
考虑以下骨架 C 函数:
Consider the following skeletal C function:
void sub() {
int count;
. . .
while (. . .) {
int count;
count++;
. . .
}
. . .
}void sub() {
int count;
. . .
while (. . .) {
int count;
count++;
. . .
}
. . .
}
count循环中的引用while是该循环的局部引用count。在这种情况下,循环内的代码隐藏了变量的count声明。通常,变量的声明会有效地隐藏变量的任何声明subwhile在更大的封闭范围内使用相同的名称。7请注意,此代码在 C 和 C++ 中是合法的,但在 Java 和 C# 中是不合法的。Java 和 C# 的设计者认为,在嵌套块中重复使用名称太容易出错,因此不允许。
The reference to count in the while loop is to that loop’s local count. In this case, the count of sub is hidden from the code inside the while loop. In general, a declaration for a variable effectively hides any declaration of a variable with the same name in a larger enclosing scope.7 Note that this code is legal in C and C++ but illegal in Java and C#. The designers of Java and C# believed that the reuse of names in nested blocks was too error prone to be allowed.
尽管 JavaScript 对其嵌套函数使用静态作用域,但该语言中无法定义非函数块。
Although JavaScript uses static scoping for its nested functions, nonfunction blocks cannot be defined in the language.
大多数函数式编程语言都包含一个与命令式语言的块相关的构造,通常名为let。这些构造有两个部分,第一部分是将名称绑定到值,通常以表达式形式指定。第二部分是使用第一部分中定义的名称的表达式。函数式语言中的程序由表达式而不是语句组成。因此,构造的最后一部分let是表达式,而不是语句。在 Scheme 中,构造let是对函数的调用LET,其形式如下:
Most functional programming languages include a construct that is related to the blocks of the imperative languages, usually named let. These constructs have two parts, the first of which is to bind names to values, usually specified as expressions. The second part is an expression that uses the names defined in the first part. Programs in functional languages are comprised of expressions, rather than statements. Therefore, the final part of a let construct is an expression, rather than a statement. In Scheme, a let construct is a call to the function LET with the following form:
(LET (
(
)
. . .
(
))
表达)
(LET (
(
)
. . .
(
))
expression)
调用 的语义LET如下:对前n 个表达式进行求值,并将值分配给相关名称。然后,对最后一个表达式进行求值,并将 的返回值作为LET该值。这与命令式语言中的块不同,因为名称是 的值;它们不是命令式意义上的变量。一旦设置,它们就无法更改。但是,它们就像命令式语言中块中的局部变量,因为它们的作用域是 的调用的本地范围LET。考虑以下对 的调用LET:
The semantics of the call to LET is as follows: The first n expressions are evaluated and the values are assigned to the associated names. Then, the final expression is evaluated and the return value of LET is that value. This differs from a block in an imperative language in that the names are of values; they are not variables in the imperative sense. Once set, they cannot be changed. However, they are like local variables in a block in an imperative language in that their scope is local to the call to LET. Consider the following call to LET:
(LET (
(top (+ a b))
(bottom (- c d)))
(/ top bottom)
)(LET (
(top (+ a b))
(bottom (- c d)))
(/ top bottom)
)
此调用计算并返回表达式的值(a + b) / (c - d)。
This call computes and returns the value of the expression (a + b) / (c - d).
在机器学习中,构造的形式let如下:
In ML, the form of a let construct is as follows:
let
val
=
... val
=
in
表达end;
let
val
=
. . . val
=
in
expressionend;
每个val语句将一个名称绑定到一个表达式。与 Scheme 一样,第一部分中的名称类似于命令式语言的命名常量;一旦设置,就无法更改。8考虑以下let构造:
Each val statement binds a name to an expression. As with Scheme, the names in the first part are like the named constants of imperative languages; once set, they cannot be changed.8 Consider the following let construct:
let
val top = a + b
val bottom = c - d
in
top / bottom
end;
let
val top = a + b
val bottom = c - d
in
top / bottom
end;
F# 中构造的一般形式let如下:
The general form of a let construct in F# is as follows:
letleft_side=表达式
let left_side = expression
left_sidelet可以是一个名称或一个元组模式(用逗号分隔的名称序列)。
The left_side of let can be a name or a tuple pattern (a sequence of names separated by commas).
在函数定义中用定义的名称的作用域let是从定义表达式的末尾到函数的末尾。let可以通过缩进以下代码来限制 的作用域,这将创建一个新的局部作用域。虽然任何缩进都可以,但惯例是缩进为四个空格。考虑以下代码:
The scope of a name defined with let inside a function definition is from the end of the defining expression to the end of the function. The scope of let can be limited by indenting the following code, which creates a new local scope. Although any indentation will work, the convention is that the indentation is four spaces. Consider the following code:
let n1 =
let n2 = 7
let n3 = n2 + 3
n3;;
let n4 = n3 + n1;;
let n1 =
let n2 = 7
let n3 = n2 + 3
n3;;
let n4 = n3 + n1;;
的范围n1扩展到整个代码。但是,n2和的范围n3在缩进结束时结束。因此,n3在最后使用let会导致错误。let n1范围的最后一行是绑定到的值n1;它可以是任何表达式。
The scope of n1 extends over all of the code. However, the scope of n2 and n3 ends when the indentation ends. So, the use of n3 in the last let causes an error. The last line of the let n1 scope is the value bound to n1; it could be any expression.
第15章包含有关 Scheme、ML、Haskell 和 F# 中的 let 构造的更多详细信息。
Chapter 15 includes more details of the let constructs in Scheme, ML, Haskell, and F#.
在 C89 以及某些其他语言中,函数中除嵌套块中的数据声明之外的所有数据声明都必须出现在函数的开头。但是,某些语言(例如 C99、C++、Java、JavaScript 和 C#)允许变量声明出现在程序单元中语句可以出现的任何位置。声明可以创建与复合语句或子程序无关的作用域。例如,在 C99、C++ 和 Java 中,所有局部变量的作用域都是从其声明到这些声明出现的块的末尾。
In C89, as well as in some other languages, all data declarations in a function except those in nested blocks must appear at the beginning of the function. However, some languages—for example, C99, C++, Java, JavaScript, and C#—allow variable declarations to appear anywhere a statement can appear in a program unit. Declarations may create scopes that are not associated with compound statements or subprograms. For example, in C99, C++, and Java, the scope of all local variables is from their declarations to the ends of the blocks in which those declarations appear.
在 C# 的官方文档中,块中声明的任何变量的作用域都被称为整个块,无论声明在块中的位置如何,只要它不在嵌套块中即可。方法也是如此。然而,这是误导性的,因为 C# 语言定义要求所有变量在使用前都必须声明。因此,尽管变量的作用域被称为从声明扩展到该声明出现的块或子程序的顶部,但变量仍然不能在其声明之上使用。
In the official documentation for C#, the scope of any variable declared in a block is said to be the whole block, regardless of the position of the declaration in the block, as long as it is not in a nested block. The same is true for methods. However, this is misleading, because the C# language definition requires that all variables be declared before they are used. Therefore, although the scope of a variable is said to extend from the declaration to the top of the block or subprogram in which that declaration appears, the variable still cannot be used above its declaration.
回想一下,C# 不允许嵌套块中的变量声明与嵌套范围中的变量同名。这与声明范围是整个块的规则一起,使以下嵌套声明x非法:
Recall that C# does not allow the declaration of a variable in a nested block to have the same name as a variable in a nesting scope. This, together with the rule that the scope of a declaration is the whole block, makes the following nested declaration of x illegal:
{
{int x; // Illegal
...
}
int x;
}{
{int x; // Illegal
...
}
int x;
}
请注意,C# 仍然要求在使用之前先声明所有变量。因此,尽管变量的作用域从声明处延伸到该声明所在的块或子程序的顶部,但该变量仍然不能在其声明之上使用。
Note that C# still requires that all be declared before they are used. Therefore, although the scope of a variable extends from the declaration to the top of the block or subprogram in which that declaration appears, the variable still cannot be used above its declaration.
在 JavaScript 中,局部变量可以在函数的任何地方声明,但这种变量的作用域始终是整个函数。如果在函数中声明之前使用,则这种变量的值为undefined。引用并不违法。
In JavaScript, local variables can be declared anywhere in a function, but the scope of such a variable is always the entire function. If used before its declaration in the function, such a variable has the value undefined. The reference is not illegal.
forC++、Java 和 C# 的语句允许在其初始化表达式中定义变量。在早期版本的 C++ 中,此类变量的作用域是从其定义到最小封闭块的末尾。然而,在标准版本中,作用域仅限于构造,for就像 Java 和 C# 的情况一样。考虑以下骨架方法:
The for statements of C++, Java, and C# allow variable definitions in their initialization expressions. In early versions of C++, the scope of such a variable was from its definition to the end of the smallest enclosing block. In the standard version, however, the scope is restricted to the for construct, as is the case with Java and C#. Consider the following skeletal method:
void fun() {
. . .
for (int count = 0; count < 10; count++){
. . .
}
. . .
}void fun() {
. . .
for (int count = 0; count < 10; count++){
. . .
}
. . .
}
在 C++ 的后续版本以及 Java 和 C# 中, 的范围count是从for语句到其主体的末尾(右括号)。
In later versions of C++, as well as in Java and C#, the scope of count is from the for statement to the end of its body (the right brace).
某些语言(包括 C、C++、PHP、JavaScript 和 Python)允许使用函数定义序列作为程序结构,其中变量定义可以出现在函数之外。文件中函数之外的定义会创建全局变量,这些全局变量可能对这些函数可见。
Some languages, including C, C++, PHP, JavaScript, and Python, allow a program structure that is a sequence of function definitions, in which variable definitions can appear outside the functions. Definitions outside functions in a file create global variables, which potentially can be visible to those functions.
C 和 C++ 既有全局数据的声明,也有全局数据的定义。声明指定类型和其他属性,但不会导致存储分配。定义指定属性并导致存储分配。对于特定的全局名称,C 程序可以有任意数量的兼容声明,但只能有一个定义。
C and C++ have both declarations and definitions of global data. Declarations specify types and other attributes but do not cause allocation of storage. Definitions specify attributes and cause storage allocation. For a specific global name, a C program can have any number of compatible declarations, but only a single definition.
在函数定义之外声明变量指定该变量是在另一个文件中定义的。C 中的全局变量在文件中的所有后续函数中都是隐式可见的,但那些包含同名局部变量声明的函数除外。在函数之后定义的全局变量可以通过将其声明为外部变量来使其在函数中可见,如下所示:
A declaration of a variable outside function definitions specifies that the variable is defined in a different file. A global variable in C is implicitly visible in all subsequent functions in the file, except those that include a declaration of a local variable with the same name. A global variable that is defined after a function can be made visible in the function by declaring it to be external, as in the following:
extern int sum;extern int sum;
在 C99 中,全局变量的定义通常具有初始值。全局变量的声明永远没有初始值。如果声明在函数定义之外,则不需要包含限定符extern。
In C99, definitions of global variables usually have initial values. Declarations of global variables never have initial values. If the declaration is outside function definitions, it need not include the extern qualifier.
这种声明和定义的思想也适用于 C 和 C++ 的函数,其中原型声明函数的名称和接口,但不提供其代码。另一方面,函数定义是完整的。
This idea of declarations and definitions carries over to the functions of C and C++, where prototypes declare names and interfaces of functions but do not provide their code. Function definitions, on the other hand, are complete.
在 C++ 中,可以使用作用域运算符 ( ) 访问被同名局部变量隐藏的全局变量::。例如,如果x是全局变量,且在函数中被名为 的局部变量隐藏x,则该全局变量可以引用为::x。
In C++, a global variable that is hidden by a local with the same name can be accessed using the scope operator (::). For example, if x is a global that is hidden in a function by a local named x, the global could be referenced as ::x.
PHP 语句可以与函数定义交错。PHP 中的变量在语句中出现时是隐式声明的。任何在函数外部隐式声明的变量都是全局变量;在函数中隐式声明的变量是局部变量。全局变量的作用域从其声明延伸到程序末尾,但会跳过任何后续函数定义。因此,全局变量在任何函数中都不是隐式可见的。可以通过两种方式使全局变量在其作用域内的函数中可见:(1) 如果函数包含与全局变量同名的局部变量,则可以通过数组访问该全局变量$GLOBALS,使用全局变量的名称作为字符串文字下标,以及 (2) 如果函数中没有与全局变量同名的局部变量,则可以通过将其包含在声明语句中使其可见global。考虑以下示例:
PHP statements can be interspersed with function definitions. Variables in PHP are implicitly declared when they appear in statements. Any variable that is implicitly declared outside any function is a global variable; variables implicitly declared in functions are local variables. The scope of global variables extends from their declarations to the end of the program but skips over any subsequent function definitions. So, global variables are not implicitly visible in any function. Global variables can be made visible in functions in their scope in two ways: (1) If the function includes a local variable with the same name as a global, that global can be accessed through the $GLOBALS array, using the name of the global as a string literal subscript, and (2) if there is no local variable in the function with the same name as the global, the global can be made visible by including it in a global declaration statement. Consider the following example:
$day = "Monday";
$month = "January";
function calendar() {
$day = "Tuesday";
global $month;
print "local day is $day ";
$gday = $GLOBALS['day'];
print "global day is $gday <br \>";
print "global month is $month ";
}
calendar();
$day = "Monday";
$month = "January";
function calendar() {
$day = "Tuesday";
global $month;
print "local day is $day ";
$gday = $GLOBALS['day'];
print "global day is $gday <br \>";
print "global month is $month ";
}
calendar();
对此代码的解释如下:
Interpretation of this code produces the following:
local day is Tuesday
global day is Monday
global month is January
local day is Tuesday
global day is Monday
global month is January
JavaScript 的全局变量与 PHP 的全局变量非常相似,不同之处在于,在已声明同名局部变量的函数中无法访问全局变量。
The global variables of JavaScript are very similar to those of PHP, except that there is no way to access a global variable in a function that has declared a local variable with the same name.
Python 中全局变量的可见性规则很不寻常。变量通常不需要声明,就像在 PHP 中一样。当它们作为赋值语句的目标出现时,它们被隐式声明。全局变量可以在函数中引用,但只有在函数中将全局变量声明为全局变量时,才能在函数中赋值。请考虑以下示例:
The visibility rules for global variables in Python are unusual. Variables are not normally declared, as in PHP. They are implicitly declared when they appear as the targets of assignment statements. A global variable can be referenced in a function, but a global variable can be assigned in a function only if it has been declared to be global in the function. Consider the following examples:
day = "Monday"
def tester():
print "The global day is:", day
tester()
day = "Monday"
def tester():
print "The global day is:", day
tester()
由于全局变量可以在函数中直接引用,因此该脚本的输出如下:
The output of this script, because globals can be referenced directly in functions, is as follows:
The global day is: MondayThe global day is: Monday
以下脚本尝试为全局变量分配一个新值day:
The following script attempts to assign a new value to the global day:
day = "Monday"
def tester():
print "The global day is:", day
day = "Tuesday"
print "The new value of day is:", day
tester()
day = "Monday"
def tester():
print "The global day is:", day
day = "Tuesday"
print "The new value of day is:", day
tester()
该脚本会产生一个错误消息,因为函数体第二行中的UnboundLocalError赋值会产生一个局部变量,这会导致函数体第一行中的引用成为对局部的非法前向引用。daydayday
This script creates an UnboundLocalError error message, because the assignment to day in the second line of the body of the function makes day a local variable, which makes the reference to day in the first line of the body of the function an illegal forward reference to the local.
如果在函数开头将 声明为全局变量,则可以将赋值day给全局变量。这可以防止将 赋值给 创建局部变量。以下脚本显示了这一点:dayday
The assignment to day can be to the global variable if day is declared to be global at the beginning of the function. This prevents the assignment to day from creating a local variable. This is shown in the following script:
day = "Monday"
def tester():
global day
print "The global day is:", day
day = "Tuesday"
print "The new value of day is:", day
tester()
day = "Monday"
def tester():
global day
print "The global day is:", day
day = "Tuesday"
print "The new value of day is:", day
tester()
该脚本的输出如下:
The output of this script is as follows:
The global day is: Monday
The new value of day is: Tuesday
The global day is: Monday
The new value of day is: Tuesday
Python 中可以嵌套函数。嵌套函数中定义的变量可通过静态作用域在嵌套函数中访问,但此类变量必须nonlocal在嵌套函数中声明。9第5.7节 中的示例骨架程序说明了对非局部变量的访问。
Functions can be nested in Python. Variables defined in nesting functions are accessible in a nested function through static scoping, but such variables must be declared nonlocal in the nested function.9 An example skeletal program in Section 5.7 illustrates accesses to nonlocal variables.
在 F# 中,函数定义之外定义的所有名称都是全局的。它们的作用域从其定义延伸到文件末尾。
All names defined outside function definitions in F# are globals. Their scope extends from their definitions to the end of the file.
声明顺序和全局变量也是面向对象语言中类和成员声明中的问题。这些将在第 12章 中讨论。
Declaration order and global variables are also issues in the class and member declarations in object-oriented languages. These are discussed in Chapter 12.
静态作用域提供了一种非局部访问方法,在许多情况下效果很好。但是,它并非没有问题。首先,在大多数情况下,它允许对变量和子程序进行比必要更多的访问。对于简明地指定此类限制而言,它实在是太粗糙了。其次,也许更重要的是,这是一个与程序演化相关的问题。软件是高度动态的——经常使用的程序不断变化。这些变化通常会导致重组,从而破坏静态作用域语言中限制变量和子程序访问的初始结构。为了避免维护这些访问限制的复杂性,开发人员通常会在结构妨碍时将其丢弃。因此,绕过静态作用域的限制可能会导致程序设计与原始程序几乎没有相似之处,即使在程序中未进行更改的区域也是如此。鼓励设计人员使用远超必要的全局变量。所有子程序最终都可以在主程序的同一级别嵌套,使用全局变量而不是更深层次的嵌套。10此外,最终的设计可能笨拙而做作,并且可能无法反映底层的概念设计。Clarke、Wileden 和 Wolf (1980) 详细讨论了静态作用域的这些和其他缺陷。使用静态作用域来控制对变量和子程序的访问的替代方法是封装构造,它包含在许多较新的语言中。第11章 讨论了封装构造。
Static scoping provides a method of nonlocal access that works well in many situations. However, it is not without its problems. First, in most cases it allows more access to both variables and subprograms than is necessary. It is simply too crude a tool for concisely specifying such restrictions. Second, and perhaps more important, is a problem related to program evolution. Software is highly dynamic—programs that are used regularly continually change. These changes often result in restructuring, thereby destroying the initial structure that restricted variable and subprogram access in a static-scoped language. To avoid the complexity of maintaining these access restrictions, developers often discard structure when it gets in the way. Thus, getting around the restrictions of static scoping can lead to program designs that bear little resemblance to the original, even in areas of the program in which changes have not been made. Designers are encouraged to use far more globals than are necessary. All subprograms can end up being nested at the same level, in the main program, using globals instead of deeper levels of nesting.10 Moreover, the final design may be awkward and contrived, and it may not reflect the underlying conceptual design. These and other defects of static scoping are discussed in detail in Clarke, Wileden, and Wolf (1980). An alternative to the use of static scoping to control access to variables and subprograms is an encapsulation construct, which is included in many newer languages. Encapsulation constructs are discussed in Chapter 11.
APL、SNOBOL4 和早期版本的 Lisp 中的变量作用域是动态的。Perl 和 Common Lisp 也允许将变量声明为具有动态作用域,尽管这些语言中的默认作用域机制是静态的。动态作用域基于子程序的调用顺序,而不是它们之间的空间关系。因此,作用域只能在运行时确定。
The scope of variables in APL, SNOBOL4, and the early versions of Lisp is dynamic. Perl and Common Lisp also allow variables to be declared to have dynamic scope, although the default scoping mechanism in these languages is static. Dynamic scoping is based on the calling sequence of subprograms, not on their spatial relationship to each other. Thus, the scope can be determined only at run time.
再次考虑第5.5.1节big中的函数,该函数在此处重现,但减去函数调用:
Consider again the function big from Section 5.5.1, which is reproduced here, minus the function calls:
function big() {
function sub1() {
var x = 7;
}
function sub2() {
var y = x;
var z = 3;
}
var x = 3;
}function big() {
function sub1() {
var x = 7;
}
function sub2() {
var y = x;
var z = 3;
}
var x = 3;
}
x假设动态作用域规则适用于非本地引用。中引用的标识符的含义sub2是动态的 — 无法在编译时确定。它可能引用 的任一声明中的变量x,具体取决于调用顺序。
Assume that dynamic-scoping rules apply to nonlocal references. The meaning of the identifier x referenced in sub2 is dynamic—it cannot be determined at compile time. It may reference the variable from either declaration of x, depending on the calling sequence.
在执行过程中确定 的正确含义的一种方法x是先从局部声明开始搜索。这也是静态作用域的开始方式,但两种技术的相似之处仅此而已。当局部声明搜索失败时,将搜索动态父级或调用函数的声明。如果在x那里找不到 的声明,则继续在该函数的动态父级中搜索,依此类推,直到x找到 的声明。如果在任何动态祖先中都找不到 ,则会出现运行时错误。
One way the correct meaning of x can be determined during execution is to begin the search with the local declarations. This is also the way the process begins with static scoping, but that is where the similarity between the two techniques ends. When the search of local declarations fails, the declarations of the dynamic parent, or calling function, are searched. If a declaration for x is not found there, the search continues in that function’s dynamic parent, and so forth, until a declaration for x is found. If none is found in any dynamic ancestor, it is a run-time error.
sub2考虑前面示例中的两个不同的 调用序列。首先,big调用sub1,后者又调用sub2。在这种情况下,搜索从本地过程 进行sub2到其调用者sub1,其中找到了 的声明。因此,在这种情况下x对 的引用是对中声明的。接下来,直接从 调用。在这种情况下, 的动态父级是,并且 引用是对中声明的。xsub2xsub1sub2bigsub2bigxbig
Consider the two different call sequences for sub2 in the earlier example. First, big calls sub1, which calls sub2. In this case, the search proceeds from the local procedure, sub2, to its caller, sub1, where a declaration for x is found. So, the reference to x in sub2 in this case is to the x declared in sub1. Next, sub2 is called directly from big. In this case, the dynamic parent of sub2 is big, and the reference is to the x declared in big.
请注意,如果使用静态作用域,则在讨论的任一调用序列中,对xin的引用sub2将是对 的big引用x。
Note that if static scoping were used, in either calling sequence discussed, the reference to x in sub2 would be to big’s x.
Perl 的动态作用域并不常见 —— 事实上,它并不完全像本节所讨论的那样,尽管其语义往往与传统的动态作用域相同(参见编程练习 1)。
Perl’s dynamic scoping is unusual—in fact, it is not exactly like that discussed in this section, although the semantics are often that of traditional dynamic scoping (see Programming Exercise 1).
动态作用域对编程的影响是深远的。使用动态作用域时,无法静态确定程序语句可见的非局部变量的正确属性。此外,对此类变量名称的引用并不总是指向同一个变量。子程序中包含对非局部变量的引用的语句在子程序的不同执行期间可以引用不同的非局部变量。动态作用域直接导致多种编程问题。
The effect of dynamic scoping on programming is profound. When dynamic scoping is used, the correct attributes of nonlocal variables visible to a program statement cannot be determined statically. Furthermore, a reference to the name of such a variable is not always to the same variable. A statement in a subprogram that contains a reference to a nonlocal variable can refer to different nonlocal variables during different executions of the subprogam. Several kinds of programming problems follow directly from dynamic scoping.
首先,在子程序开始执行和结束执行的这段时间内,子程序的局部变量对于任何其他正在执行的子程序都是可见的,无论其文本接近度如何或执行如何到达当前正在执行的子程序。没有办法保护局部变量免受这种可访问性的影响。子程序始终在所有之前调用但尚未完成执行的子程序的环境中执行。因此,动态作用域导致的程序可靠性低于静态作用域。
First, during the time span beginning when a subprogram begins its execution and ending when that execution ends, the local variables of the subprogram are all visible to any other executing subprogram, regardless of its textual proximity or how execution got to the currently executing subprogram. There is no way to protect local variables from this accessibility. Subprograms are always executed in the environment of all previously called subprograms that have not yet completed their executions. As a result, dynamic scoping results in less reliable programs than static scoping.
动态作用域的第二个问题是无法静态地检查对非局部变量的引用。这个问题是由于无法静态地找到作为非局部变量引用的变量的声明而导致的。
A second problem with dynamic scoping is the inability to type check references to nonlocals statically. This problem results from the inability to statically find the declaration for a variable referenced as a nonlocal.
动态作用域也使程序更难阅读,因为必须知道子程序的调用顺序才能确定对非局部变量的引用的含义。这项任务对于人类读者来说几乎是不可能完成的。
Dynamic scoping also makes programs much more difficult to read, because the calling sequence of subprograms must be known to determine the meaning of references to nonlocal variables. This task can be virtually impossible for a human reader.
最后,在动态作用域语言中,访问非局部变量所花的时间比使用静态作用域时访问非局部变量所花的时间长得多。第10章 解释了其中的原因。
Finally, accesses to nonlocal variables in dynamic-scoped languages take far longer than accesses to nonlocals when static scoping is used. The reason for this is explained in Chapter 10.
另一方面,动态作用域并非毫无价值。在许多情况下,从一个子程序传递到另一个子程序的参数是调用者定义的变量。在动态作用域语言中,这些变量都不需要传递,因为它们在被调用的子程序中是隐式可见的。
On the other hand, dynamic scoping is not without merit. In many cases, the parameters passed from one subprogram to another are variables that are defined in the caller. None of these needs to be passed in a dynamically scoped language, because they are implicitly visible in the called subprogram.
不难理解为什么动态作用域没有静态作用域那么广泛使用。静态作用域语言中的程序比动态作用域语言中的等效程序更易于阅读、更可靠、执行速度更快。正是由于这些原因,在大多数当前的 Lisp 方言中,动态作用域被静态作用域所取代。第10章 讨论了静态和动态作用域的实现方法。
It is not difficult to understand why dynamic scoping is not as widely used as static scoping. Programs in static-scoped languages are easier to read, are more reliable, and execute faster than equivalent programs in dynamic-scoped languages. It was precisely for these reasons that dynamic scoping was replaced by static scoping in most current dialects of Lisp. Implementation methods for both static and dynamic scoping are discussed in Chapter 10.
有时,变量的作用域和生存期似乎是相关的。例如,考虑在不包含方法调用的 Java 方法中声明的变量。这种变量的作用域是从声明到方法结束。该变量的生存期是从进入方法开始到方法执行终止的时间段。虽然变量的作用域和生存期显然不一样,因为静态作用域是一个文本或空间概念,而生存期是一个时间概念,但它们至少在这种情况下似乎是相关的。
Sometimes the scope and lifetime of a variable appear to be related. For example, consider a variable that is declared in a Java method that contains no method calls. The scope of such a variable is from its declaration to the end of the method. The lifetime of that variable is the period of time beginning when the method is entered and ending when execution of the method terminates. Although the scope and lifetime of the variable are clearly not the same, because static scope is a textual, or spatial, concept whereas lifetime is a temporal concept, they at least appear to be related in this case.
作用域和生存期之间的这种明显关系在其他情况下并不成立。例如,在 C 和 C++ 中,使用说明符在函数中声明的变量static静态地绑定到该函数的作用域,并且也静态地绑定到存储。因此,它的作用域是静态的并且是函数本地的,但它的生存期延伸到它所属的程序的整个执行过程中。
This apparent relationship between scope and lifetime does not hold in other situations. In C and C++, for example, a variable that is declared in a function using the specifier static is statically bound to the scope of that function and is also statically bound to storage. So, its scope is static and local to the function, but its lifetime extends over the entire execution of the program of which it is a part.
当涉及子程序调用时,作用域和生存期也是不相关的。考虑以下 C++ 函数:
Scope and lifetime are also unrelated when subprogram calls are involved. Consider the following C++ functions:
void printheader() {
. . .
} /* end of printheader */
void compute() {
int sum;
. . .
printheader();
} /* end of compute */
void printheader() {
. . .
} /* end of printheader */
void compute() {
int sum;
. . .
printheader();
} /* end of compute */
变量的作用域sum完全包含在compute函数内。它不会扩展到函数体printheader,尽管printheader在执行过程中执行compute。但是,的生存期sum延伸到执行的整个时间。在调用之前绑定到的printheader任何存储位置,该绑定都将在执行期间和执行之后继续。sumprintheaderprintheader
The scope of the variable sum is completely contained within the compute function. It does not extend to the body of the function printheader, although printheader executes in the midst of the execution of compute. However, the lifetime of sum extends over the time during which printheader executes. Whatever storage location sum is bound to before the call to printheader, that binding will continue during and after the execution of printheader.
语句的引用环境是该语句中可见的所有变量的集合。静态作用域语言中语句的引用环境是其局部作用域中声明的变量加上其祖先作用域中可见的所有变量的集合。在这种语言中,语句的引用环境在编译时是必需的,因此可以创建代码和数据结构以允许在运行时引用来自其他作用域的变量。第10章 讨论了在静态和动态作用域语言中实现对非局部变量的引用的技术。
The referencing environment of a statement is the collection of all variables that are visible in the statement. The referencing environment of a statement in a static-scoped language is the variables declared in its local scope plus the collection of all variables of its ancestor scopes that are visible. In such a language, the referencing environment of a statement is needed while that statement is being compiled, so code and data structures can be created to allow references to variables from other scopes during run time. Techniques for implementing references to nonlocal variables in both static- and dynamic-scoped languages are discussed in Chapter 10.
在 Python 中,作用域可以通过函数定义来创建。语句的引用环境包括局部变量,以及语句所嵌套的函数中声明的所有变量(不包括非局部作用域中被较近函数中的声明隐藏的变量)。每个函数定义都会创建一个新的作用域,从而创建一个新环境。请考虑以下 Python 骨架程序:
In Python, scopes can be created by function definitions. The referencing environment of a statement includes the local variables, plus all of the variables declared in the functions in which the statement is nested (excluding variables in nonlocal scopes that are hidden by declarations in nearer functions). Each function definition creates a new scope and thus a new environment. Consider the following Python skeletal program:
g = 3; # A global
def sub1():
a = 5; # Creates a local
b = 7; # Creates another local
. . . <------------------------------ 1
def sub2():
global g; # Global g is now assignable here
c = 9; # Creates a new local
. . . <------------------------------ 2
def sub3():
nonlocal c: # Makes nonlocal c visible here
g = 11; # Creates a new local
. . . <------------------------------ 3
g = 3; # A global
def sub1():
a = 5; # Creates a local
b = 7; # Creates another local
. . . <------------------------------ 1
def sub2():
global g; # Global g is now assignable here
c = 9; # Creates a new local
. . . <------------------------------ 2
def sub3():
nonlocal c: # Makes nonlocal c visible here
g = 11; # Creates a new local
. . . <------------------------------ 3
所示程序点的引用环境如下:
The referencing environments of the indicated program points are as follows:
现在考虑这个骨架程序的变量声明。首先,请注意,尽管 的范围sub1比 更高级别(嵌套程度较低)sub3, 的范围sub1不是 的静态祖先sub3,因此sub3无法访问 中声明的变量sub1。这是有充分理由的。 中声明的变量sub1是堆栈动态的,因此如果sub1不在执行中,它们就不会绑定到存储。因为不在sub3执行时 可以在 中sub1,所以不能允许访问 中的变量sub1,这些变量在 执行期间不一定会绑定到存储sub3。
Now consider the variable declarations of this skeletal program. First, note that, although the scope of sub1 is at a higher level (it is less deeply nested) than sub3, the scope of sub1 is not a static ancestor of sub3, so sub3 does not have access to the variables declared in sub1. There is a good reason for this. The variables declared in sub1 are stack dynamic, so they are not bound to storage if sub1 is not in execution. Because sub3 can be in execution when sub1 is not, it cannot be allowed to access variables in sub1, which would not necessarily be bound to storage during the execution of sub3.
如果子程序的执行已经开始但尚未终止,则该子程序处于活动状态。动态作用域语言中语句的引用环境是本地声明的变量,加上当前处于活动状态的所有其他子程序的变量。同样,活动子程序中的某些变量可能会从引用环境中隐藏。最近的子程序激活可以声明隐藏先前子程序激活中同名变量的变量。
A subprogram is active if its execution has begun but has not yet terminated. The referencing environment of a statement in a dynamically scoped language is the locally declared variables, plus the variables of all other subprograms that are currently active. Once again, some variables in active subprograms can be hidden from the referencing environment. Recent subprogram activations can have declarations for variables that hide variables with the same names in previous subprogram activations.
考虑以下示例程序。假设仅有的函数调用如下:main调用sub2,而后者又调用sub1。
Consider the following example program. Assume that the only function calls are the following: main calls sub2, which calls sub1.
void sub1() {
int a, b;
. . . <------------ 1
} /* end of sub1 */
void sub2() {
int b, c;
. . . . <------------ 2
sub1();
} /* end of sub2 */
void main() {
int c, d;
. . . <------------ 3
sub2();
} /* end of main */
void sub1() {
int a, b;
. . . <------------ 1
} /* end of sub1 */
void sub2() {
int b, c;
. . . . <------------ 2
sub1();
} /* end of sub2 */
void main() {
int c, d;
. . . <------------ 3
sub2();
} /* end of main */
所示程序点的引用环境如下:
The referencing environments of the indicated program points are as follows:
命名常量是只绑定一次值的变量。命名常量有助于提高可读性和程序可靠性。例如,可以使用名称pi代替常量来提高可读性3.14159265。
A named constant is a variable that is bound to a value only once. Named constants are useful as aids to readability and program reliability. Readability can be improved, for example, using the name pi instead of the constant 3.14159265.
命名常量的另一个重要用途是参数化程序。例如,考虑一个处理固定数量数据值的程序,比如 100。这样的程序通常100在多个位置使用常量来声明数组下标范围和loop控制限值。考虑以下 Java 程序的骨架片段:
Another important use of named constants is to parameterize a program. For example, consider a program that processes a fixed number of data values, say 100. Such a program usually uses the constant 100 in a number of locations for declaring array subscript ranges and for loop control limits. Consider the following skeletal Java program segment:
void example() {
int[] intList = new int[100];
String[] strList = new String[100];
. . .
for (index = 0; index < 100; index++) {
. . .
}
. . .
for (index = 0; index < 100; index++) {
. . .
}
. . .
average = sum / 100;
. . .
}void example() {
int[] intList = new int[100];
String[] strList = new String[100];
. . .
for (index = 0; index < 100; index++) {
. . .
}
. . .
for (index = 0; index < 100; index++) {
. . .
}
. . .
average = sum / 100;
. . .
}
当必须修改此程序以处理不同数量的数据值时,100必须找到并更改所有出现的 。在大型程序中,这可能很繁琐且容易出错。一种更简单、更可靠的方法是使用命名常量作为程序参数,如下所示:
When this program must be modified to deal with a different number of data values, all occurrences of 100 must be found and changed. On a large program, this can be tedious and error prone. An easier and more reliable method is to use a named constant as a program parameter, as follows:
void example() {
final int len = 100;
int[] intList = new int[len];
String[] strList = new String[len];
. . .
for (index = 0; index < len; index++) {
. . .
}
. . .
for (index = 0; index < len; index++) {
. . .
}
. . .
average = sum / len;
. . .
}void example() {
final int len = 100;
int[] intList = new int[len];
String[] strList = new String[len];
. . .
for (index = 0; index < len; index++) {
. . .
}
. . .
for (index = 0; index < len; index++) {
. . .
}
. . .
average = sum / len;
. . .
}
现在,当必须更改长度时,只需更改一行(变量len),无论它在程序中使用了多少次。这是抽象的好处的另一个例子。名称len是某些数组中元素数量和某些循环中迭代次数的抽象。这说明了命名常量如何有助于可修改性。
Now, when the length must be changed, only one line must be changed (the variable len), regardless of the number of times it is used in the program. This is another example of the benefits of abstraction. The name len is an abstraction for the number of elements in some arrays and the number of iterations in some loops. This illustrates how named constants can aid modifiability.
C++ 允许将值动态绑定到命名常量。这允许将包含变量的表达式分配给声明中的常量。例如,C++ 语句
C++ allows dynamic binding of values to named constants. This allows expressions containing variables to be assigned to constants in the declarations. For example, the C++ statement
const int result = 2 * width + 1;const int result = 2 * width + 1;
声明result为一个整数类型,名为常量,其值设置为表达式的值2 * width + 1,其中变量的值在分配并绑定到其值width时必须是可见的。result
declares result to be an integer type named constant whose value is set to the value of the expression 2 * width + 1, where the value of the variable width must be visible when result is allocated and bound to its value.
Java 还允许将值动态绑定到命名常量。在 Java 中,命名常量用保留字定义final(如前面的示例)。初始值可以在声明语句中或后续赋值语句中给出。赋值可以用任何表达式指定。
Java also allows dynamic binding of values to named constants. In Java, named constants are defined with the final reserved word (as in the earlier example). The initial value can be given in the declaration statement or in a subsequent assignment statement. The assigned value can be specified with any expression.
C# 有两种命名常量:用 定义的常量const和用 定义的常量readonly。const隐式 的命名常量static静态地绑定到值;也就是说,它们在编译时绑定到值,这意味着这些值只能用文字或其他const成员指定。readonly动态绑定到值的命名常量可以在声明中或使用静态构造函数进行赋值。11因此,如果程序需要一个常量值对象,并且该对象在每次使用程序时的值都相同,const则应使用常量。但是,如果程序需要一个常量值对象,该对象的值仅在创建对象时确定,并且在程序的不同执行过程中可能不同,则应readonly使用常量。
C# has two kinds of named constants: those defined with const and those defined with readonly. The const named constants, which are implicitly static, are statically bound to values; that is, they are bound to values at compile time, which means those values can be specified only with literals or other const members. The readonly named constants, which are dynamically bound to values, can be assigned in the declaration or with a static constructor.11 So, if a program needs a constant-valued object whose value is the same on every use of the program, a const constant is used. However, if a program needs a constant-valued object whose value is determined only when the object is created and can be different for different executions of the program, then a readonly constant is used.
关于将值绑定到命名常量的讨论自然会引出初始化的主题,因为将值绑定到命名常量是相同的过程,只是它是永久性的。
The discussion of binding values to named constants naturally leads to the topic of initialization, because binding a value to a named constant is the same process, except it is permanent.
在许多情况下,在声明变量的程序或子程序的代码开始执行之前,让变量具有值是很方便的。将变量绑定到存储时将其绑定到值的过程称为初始化。如果变量静态绑定到存储,则绑定和初始化在运行时之前发生。在这些情况下,初始值必须指定为文字或表达式,其唯一的非文字操作数是已定义的命名常量。如果存储绑定是动态的,则初始化也是动态的,初始值可以是任何表达式。
In many instances, it is convenient for variables to have values before the code of the program or subprogram in which they are declared begins executing. The binding of a variable to a value at the time it is bound to storage is called initialization. If the variable is statically bound to storage, binding and initialization occur before run time. In these cases, the initial value must be specified as a literal or an expression whose only nonliteral operands are named constants that have already been defined. If the storage binding is dynamic, initialization is also dynamic and the initial values can be any expression.
在大多数语言中,初始化是在创建变量的声明中指定的。例如,在 C++ 中,我们可以有
In most languages, initialization is specified on the declaration that creates the variable. For example, in C++, we could have
int sum = 0;
int* ptrSum = ∑
char name[] = "George Washington Carver";
int sum = 0;
int* ptrSum = ∑
char name[] = "George Washington Carver";
区分大小写和使用下划线是名称的设计问题。
Case sensitivity and the use of underscores are the design issues for names.
变量可以通过六元组属性来表征:名称、地址、值、类型、生存期和范围。
Variables can be characterized by the sextuple of attributes: name, address, value, type, lifetime, and scope.
别名是两个或多个变量绑定到同一个存储地址。它们被认为对可靠性有害,但很难从语言中完全消除。
Aliases are two or more variables bound to the same storage address. They are regarded as detrimental to reliability but are difficult to eliminate entirely from a language.
绑定是属性与程序实体的关联。了解属性与实体的绑定时间对于理解编程语言的语义至关重要。绑定可以是静态的,也可以是动态的。声明(无论是显式的还是隐式的)提供了一种指定变量与类型的静态绑定的方法。一般来说,动态绑定允许更大的灵活性,但会牺牲可读性、效率和可靠性。
Binding is the association of attributes with program entities. Knowledge of the binding times of attributes to entities is essential to understanding the semantics of programming languages. Binding can be static or dynamic. Declarations, either explicit or implicit, provide a means of specifying the static binding of variables to types. In general, dynamic binding allows greater flexibility but at the expense of readability, efficiency, and reliability.
根据标量变量的生命周期,标量变量可以分为四类:静态、堆栈动态、显式堆动态和隐式堆动态。
Scalar variables can be separated into four categories by considering their lifetimes: static, stack dynamic, explicit heap dynamic, and implicit heap dynamic.
静态作用域是 ALGOL 60 及其部分子程序的核心功能。它提供了一种简单、可靠且高效的方法来允许子程序中非局部变量的可见性。动态作用域比静态作用域提供了更大的灵活性,但同样以牺牲可读性、可靠性和效率为代价。
Static scoping is a central feature of ALGOL 60 and some of its descendants. It provides a simple, reliable, and efficient method of allowing visibility of nonlocal variables in subprograms. Dynamic scoping provides more flexibility than static scoping but, again, at the expense of readability, reliability, and efficiency.
大多数函数式语言允许用户使用构造创建本地范围let,从而限制其定义名称的范围。
Most functional languages allow the user to create local scopes with let constructs, which limit the scope of their defined names.
语句的引用环境是该语句可见的所有变量的集合。
The referencing environment of a statement is the collection of all of the variables that are visible to that statement.
命名常量只是仅绑定一次值的变量。
Named constants are simply variables that are bound to values only once.
名称的设计存在哪些问题?
What are the design issues for names?
区分大小写的名称有哪些潜在危险?
What is the potential danger of case-sensitive names?
什么是别名?
What is an alias?
哪一类 C++ 引用变量总是产生别名?
Which category of C++ reference variables always produces aliases?
变量的左值是什么?右值是什么?
What is the l-value of a variable? What is the r-value?
定义绑定和绑定时间。
Define binding and binding time.
经过语言设计和实现,程序中可以发生哪四次绑定?
After language design and implementation, what are the four times bindings can take place in a program?
定义静态绑定和动态绑定。
Define static binding and dynamic binding.
隐式声明的优点和缺点是什么?
What are the advantages and disadvantages of implicit declarations?
动态类型绑定的优点和缺点是什么?
What are the advantages and disadvantages of dynamic type binding?
定义静态、堆栈动态、显式堆动态和隐式堆动态变量。它们的优点和缺点是什么?
Define static, stack-dynamic, explicit heap-dynamic, and implicit heap-dynamic variables. What are their advantages and disadvantages?
定义生命周期、范围、静态范围和动态范围。
Define lifetime, scope, static scope, and dynamic scope.
静态范围程序中对非局部变量的引用与其定义有何关联?
How is a reference to a nonlocal variable in a static-scoped program connected to its definition?
静态作用域的常见问题是什么?
What is the general problem with static scoping?
语句的引用环境是什么?
What is the referencing environment of a statement?
什么是子程序的静态祖先? 什么是子程序的动态祖先?
What is a static ancestor of a subprogram? What is a dynamic ancestor of a subprogram?
什么是区块?
What is a block?
let函数式语言中构造的用途是什么?
What is the purpose of the let constructs in functional languages?
letML构造中定义的名称与 C 块中声明的变量有何区别?
What is the difference between the names defined in an ML let construct from the variables declared in a C block?
let描述 F#在函数内部和所有函数外部的封装。
Describe the encapsulation of an F# let inside a function and outside all functions.
动态作用域的优点和缺点是什么?
What are the advantages and disadvantages of dynamic scoping?
命名常量有什么优点?
What are the advantages of named constants?
以下哪种标识符形式最易读?支持你的判断。
Which of the following identifier forms is most readable? Support your decision.
SumOfSales
sum_of_sales
SUMOFSALES
SumOfSales
sum_of_sales
SUMOFSALES
有些编程语言是无类型的。语言中没有类型的明显优点和缺点是什么?
Some programming languages are typeless. What are the obvious advantages and disadvantages of having no types in a language?
用您熟悉的某种语言编写一个带有一个算术运算符的简单赋值语句。对于语句的每个组成部分,列出执行语句时确定语义所需的各种绑定。对于每个绑定,指出该语言使用的绑定时间。
Write a simple assignment statement with one arithmetic operator in some language you know. For each component of the statement, list the various bindings that are required to determine the semantics when the statement is executed. For each binding, indicate the binding time used for the language.
动态类型绑定与隐式堆动态变量密切相关。解释这种关系。
Dynamic type binding is closely related to implicit heap-dynamic variables. Explain this relationship.
描述子程序中历史敏感变量有用的情况。
Describe a situation when a history-sensitive variable in a subprogram is useful.
考虑以下 JavaScript 骨架程序:
Consider the following JavaScript skeletal program:
// The main program
var x;
function sub1() {
var x;
function sub2() {
. . .
}
}
function sub3() {
. . .
}// The main program
var x;
function sub1() {
var x;
function sub2() {
. . .
}
}
function sub3() {
. . .
}
假设该程序的执行按照以下单元顺序:
Assume that the execution of this program is in the following unit order:
main调用sub1
main calls sub1
sub1调用sub2
sub1 calls sub2
sub2调用sub3
sub2 calls sub3
假设静态作用域,以下哪个声明x对于 的引用是正确的x?
sub1
sub2
sub3
Assuming static scoping, in the following, which declaration of x is the correct one for a reference to x?
sub1
sub2
sub3
重复部分 a,但假设动态作用域。
Repeat part a, but assume dynamic scoping.
假设使用静态作用域规则解释以下 JavaScript 程序。x在函数中显示的值是什么sub1?在动态作用域规则下,x在函数中显示的值是什么sub1?
Assume the following JavaScript program was interpreted using static-scoping rules. What value of x is displayed in function sub1? Under dynamic-scoping rules, what value of x is displayed in function sub1?
var x;
function sub1() {
document.write("x = " + x + "");
}
function sub2() {
var x;
x = 10;
sub1();
}
x = 5;
sub2();
var x;
function sub1() {
document.write("x = " + x + "");
}
function sub2() {
var x;
x = 10;
sub1();
}
x = 5;
sub2();
考虑以下 JavaScript 程序:
Consider the following JavaScript program:
var x, y, z;
function sub1() {
var a, y, z;
function sub2() {
var a, b, z;
. . .
}
. . .
}
function sub3() {
var a, x, w;
. . .
}
var x, y, z;
function sub1() {
var a, y, z;
function sub2() {
var a, b, z;
. . .
}
. . .
}
function sub3() {
var a, x, w;
. . .
}
列出所有变量以及声明它们的程序单元,这些变量在sub1、sub2和的主体中可见sub3,假设使用静态作用域。
List all the variables, along with the program units where they are declared, that are visible in the bodies of sub1, sub2, and sub3, assuming static scoping is used.
考虑以下 Python 程序:
Consider the following Python program:
x = 1;
y = 3;
z = 5;
def sub1():
a = 7;
y = 9;
z = 11;
. . .
def sub2():
global x;
a = 13;
x = 15;
w = 17;
. . .
def sub3():
nonlocal a;
a = 19;
b = 21;
z = 23;
. . .
. . .
x = 1;
y = 3;
z = 5;
def sub1():
a = 7;
y = 9;
z = 11;
. . .
def sub2():
global x;
a = 13;
x = 15;
w = 17;
. . .
def sub3():
nonlocal a;
a = 19;
b = 21;
z = 23;
. . .
. . .
列出所有变量以及声明它们的程序单元,这些变量在sub1、sub2和的主体中可见sub3,假设使用静态作用域。
List all the variables, along with the program units where they are declared, that are visible in the bodies of sub1, sub2, and sub3, assuming static scoping is used.
考虑以下 C 程序:
Consider the following C program:
void fun(void) {
int a, b, c; /* definition 1 */
. . .
while (. . .) {
int b, c, d; /*definition 2 */
. . . <------------- 1
while (. . .) {
int c, d, e; /* definition 3 */
. . . <------------- 2
}
. . . <-------------- 3
}
. . . <---------------- 4
}void fun(void) {
int a, b, c; /* definition 1 */
. . .
while (. . .) {
int b, c, d; /*definition 2 */
. . . <------------- 1
while (. . .) {
int c, d, e; /* definition 3 */
. . . <------------- 2
}
. . . <-------------- 3
}
. . . <---------------- 4
}
对于此函数中的四个标记点,列出每个可见变量以及定义它的定义语句的编号。
For each of the four marked points in this function, list each visible variable, along with the number of the definition statement that defines it.
考虑以下骨架 C 程序:
Consider the following skeletal C program:
void fun1(void); /* prototype */
void fun2(void); /* prototype */
void fun3(void); /* prototype */
void main() {
int a, b, c;
. . .
}
void fun1(void) {
int b, c, d;
. . .
}
void fun2(void) {
int c, d, e;
. . .
}
void fun3(void) {
int d, e, f;
. . .
}
void fun1(void); /* prototype */
void fun2(void); /* prototype */
void fun3(void); /* prototype */
void main() {
int a, b, c;
. . .
}
void fun1(void) {
int b, c, d;
. . .
}
void fun2(void) {
int c, d, e;
. . .
}
void fun3(void) {
int d, e, f;
. . .
}
给定以下调用序列并假设使用动态作用域,在执行最后一个调用的函数期间哪些变量是可见的?每个可见变量都应包括定义它的函数的名称。
Given the following calling sequences and assuming that dynamic scoping is used, what variables are visible during execution of the last function called? Include with each visible variable the name of the function in which it was defined.
main 呼叫 fun1;fun1呼叫fun2;fun2打电话fun3。
main calls fun1; fun1 calls fun2; fun2 calls fun3.
main 呼叫 fun1;fun1呼叫fun3.
main calls fun1; fun1 calls fun3.
main 呼叫 fun2;fun2呼叫fun3;fun3打电话fun1。
main calls fun2; fun2 calls fun3; fun3 calls fun1.
main 呼唤fun3;fun3呼叫fun1。
main calls fun3; fun3 calls fun1.
main 呼叫fun1;fun1呼叫fun3;fun3打电话fun2。
main calls fun1; fun1 calls fun3; fun3 calls fun2.
main 呼叫fun3;fun3呼叫fun2;fun2打电话fun1。
main calls fun3; fun3 calls fun2; fun2 calls fun1.
考虑以下用类似 JavaScript 的语法编写的程序:
Consider the following program, written in JavaScript-like syntax:
// main program
var x, y, z;
function sub1() {
var a, y, z;
. . .
}
function sub2() {
var a, b, z;
. . .
}
function sub3() {
var a, x, w;
. . .
}// main program
var x, y, z;
function sub1() {
var a, y, z;
. . .
}
function sub2() {
var a, b, z;
. . .
}
function sub3() {
var a, x, w;
. . .
}
给定以下调用序列并假设使用动态作用域,在执行最后一个激活的子程序期间哪些变量是可见的?每个可见变量都应包括声明它的单元的名称。
Given the following calling sequences and assuming that dynamic scoping is used, what variables are visible during execution of the last subprogram activated? Include with each visible variable the name of the unit where it is declared.
main呼叫sub1;sub1呼叫sub2;sub2打电话sub3。
main calls sub1; sub1 calls sub2; sub2 calls sub3.
main呼唤sub1;sub1呼叫sub3。
main calls sub1; sub1 calls sub3.
main呼叫sub2;sub2呼叫sub3;sub3打电话sub1。
main calls sub2; sub2 calls sub3; sub3 calls sub1.
main呼唤sub3;sub3呼叫sub1。
main calls sub3; sub3 calls sub1.
main呼叫sub1;sub1呼叫sub3;sub3打电话sub2。
main calls sub1; sub1 calls sub3; sub3 calls sub2.
main呼叫sub3;sub3呼叫sub2;sub2打电话sub1。
main calls sub3; sub3 calls sub2; sub2 calls sub1.
Perl 允许静态和一种动态作用域。编写一个同时使用这两种作用域的 Perl 程序,并清楚地展示两者的效果差异。清楚地解释本章中描述的动态作用域与 Perl 中实现的动态作用域之间的差异。
Perl allows both static and a kind of dynamic scoping. Write a Perl program that uses both and clearly shows the difference in effect of the two. Explain clearly the difference between the dynamic scoping described in this chapter and that implemented in Perl.
编写一个 Common Lisp 程序,清楚地显示静态和动态作用域之间的区别。
Write a Common Lisp program that clearly shows the difference between static and dynamic scoping.
编写一个 JavaScript 脚本,其中有三层嵌套的子程序,并且每个嵌套子程序都引用其所有封闭子程序中定义的变量。
Write a JavaScript script that has subprograms nested three deep and in which each nested subprogram references variables defined in all of its enclosing subprograms.
使用 Python重复编程练习 3。
Repeat Programming Exercise 3 with Python.
编写一个包含以下语句序列的 C99 函数:
Write a C99 function that includes the following sequence of statements:
x = 21;
int x;
x = 42;
x = 21;
int x;
x = 42;
运行程序并解释结果。用 C++ 和 Java 重写相同的代码并比较结果。
Run the program and explain the results. Rewrite the same code in C++ and Java and compare the results.
使用 C++、Java 和 C# 编写测试程序来确定语句中声明的变量的范围for。具体来说,代码必须确定此类变量在语句主体之后是否可见for。
Write test programs in C++, Java, and C# to determine the scope of a variable declared in a for statement. Specifically, the code must determine whether such a variable is visible after the body of the for statement.
用 C 或 C++ 编写三个函数:一个静态声明一个大数组,一个在堆栈上声明相同的大数组,一个从堆中创建相同的大数组。多次调用每个子程序(至少 100,000 次),并输出每个子程序所需的时间。解释结果。
Write three functions in C or C++: one that declares a large array statically, one that declares the same large array on the stack, and one that creates the same large array from the heap. Call each of the subprograms a large number of times (at least 100,000) and output the time required by each. Explain the results.
本章首先介绍数据类型的概念和常见原始数据类型的特征。然后,讨论枚举和子范围类型的设计。接下来,研究结构化数据类型(特别是数组、关联数组、记录、元组、列表和联合)的细节。本节随后深入介绍指针和引用。讨论的最后一类类型是可选类型。
This chapter first introduces the concept of a data type and the characteristics of the common primitive data types. Then, the designs of enumeration and subrange types are discussed. Next, the details of structured data types—specifically arrays, associative arrays, records, tuples, lists, and unions—are investigated. This section is followed by an in-depth look at pointers and references. The last category of types discussed are the optional types.
对于每种不同的数据类型,本文都阐述了设计问题,并描述了一些常用语言的设计者所作的设计选择,然后对这些设计进行了评估。
For each of the various categories of data types, the design issues are stated and the design choices made by the designers of some common languages are described. These designs are then evaluated.
接下来的三节将深入探讨类型检查、强类型和类型等价规则。本章的最后一节简要介绍了数据类型理论的基础知识。
The next three sections provide a thorough investigation of type checking, strong typing, and type equivalence rules. The last section of the chapter briefly introduces the fundamentals of the theory of data types.
数据类型的实现方法有时会对其设计产生重大影响。因此,各种数据类型的实现是本章的另一个重要部分,尤其是数组。
Implementation methods for data types sometimes have a significant impact on their design. Therefore, implementation of the various data types is another important part of this chapter, especially arrays.
数据类型定义了一组数据值和一组针对这些值的预定义操作。计算机程序通过操纵数据来产生结果。决定它们执行此任务的难易程度的一个重要因素是所用语言中可用的数据类型与所解决问题的现实世界中的对象匹配程度。因此,语言支持适当的数据类型和结构集合至关重要。
A data type defines a collection of data values and a set of predefined operations on those values. Computer programs produce results by manipulating data. An important factor in determining the ease with which they can perform this task is how well the data types available in the language being used match the objects in the real world of the problem being addressed. Therefore, it is crucial that a language supports an appropriate collection of data types and structures.
数据类型的现代概念在过去 60 年中不断发展。在最早的语言中,所有问题空间数据结构都必须使用少数几个基本语言支持的数据结构进行建模。例如,在 90 年代之前的 Fortrans 中,链表和二叉树是用数组实现的。
The contemporary concepts of data typing have evolved over the last 60 years. In the earliest languages, all problem space data structures had to be modeled with only a few basic language-supported data structures. For example, in pre-90 Fortrans, linked lists and binary trees were implemented with arrays.
COBOL 的数据结构与 Fortran I 模型的差异在于,它允许程序员指定十进制数据值的精度,并且为信息记录提供结构化数据类型。PL/I 将精度指定功能扩展到整数和浮点类型。PL/I 的设计者包含了许多数据类型,旨在支持广泛的应用程序。ALGOL 68 中引入了一种更好的方法,即提供一些基本类型和一些灵活的结构定义运算符,使程序员可以根据每种需求设计数据结构。显然,这是数据类型设计发展中最重要的进步之一。用户定义类型还通过使用有意义的类型名称来提高可读性。它们允许对特殊使用类别的变量进行类型检查,否则这是不可能的。用户定义类型还有助于可修改性:程序员只需更改类型定义语句即可更改程序中某类变量的类型。
The data structures of COBOL took the first step away from the Fortran I model by allowing programmers to specify the accuracy of decimal data values, and also by providing a structured data type for records of information. PL/I extended the capability of accuracy specification to integer and floating-point types. The designers of PL/I included many data types, with the intent of supporting a large range of applications. A better approach, introduced in ALGOL 68, is to provide a few basic types and a few flexible structure-defining operators that allow a programmer to design a data structure for each need. Clearly, this was one of the most important advances in the evolution of data type design. User-defined types also provide improved readability through the use of meaningful names for types. They allow type checking of the variables of a special category of use, which would otherwise not be possible. User-defined types also aid modifiability: A programmer can change the type of a category of variables in a program by changing a type definition statement only.
进一步探讨用户定义类型的概念,我们得到了抽象数据类型,自 20 世纪 80 年代中期以来设计的大多数编程语言都支持这种类型。抽象数据类型的基本思想是,类型的接口(对用户可见)与该类型值的表示和操作集(对用户隐藏)是分开的。高级编程语言提供的所有类型都是抽象数据类型。第11章 将详细讨论用户定义的抽象数据类型。
Taking the concept of a user-defined type a step further, we arrive at abstract data types, which are supported by most programming languages designed since the mid-1980s. The fundamental idea of an abstract data type is that the interface of a type, which is visible to the user, is separated from the representation and set of operations on values of that type, which are hidden from the user. All of the types provided by a high-level programming language are abstract data types. User-defined abstract data types are discussed in detail in Chapter 11.
编程语言的类型系统有许多用途。其中最实用的是错误检测。第6.12节 讨论了由语言的类型系统指导的类型检查的过程和价值。类型系统的第二个用途是它为程序模块化提供帮助。这是跨模块类型检查的结果,它确保了模块间接口的一致性。类型系统的另一个用途是文档。程序中的类型声明记录了有关其数据的信息,这些信息为程序的行为提供了线索。
There are a number of uses of the type system of a programming language. The most practical of these is error detection. The process and value of type checking, which is directed by the type system of the language, are discussed in Section 6.12. A second use of a type system is the assistance it provides for program modularization. This results from the cross-module type checking that ensures the consistency of the interfaces among modules. Another use of a type system is documentation. The type declarations in a program document information about its data, which provides clues about the program’s behavior.
编程语言的类型系统定义了类型如何与语言中的每个表达式相关联,并包括其类型等价性和类型兼容性规则。当然,理解编程语言语义的最重要部分之一是理解其类型系统。
The type system of a programming language defines how a type is associated with each expression in the language and includes its rules for type equivalence and type compatibility. Certainly, one of the most important parts of understanding the semantics of a programming language is understanding its type system.
命令式语言中两种最常见的结构化(非标量)数据类型是数组和记录,尽管近年来关联数组的流行度显著提高。自 1959 年第一种函数式编程语言(Lisp)出现以来,列表一直是函数式编程语言的核心部分。在过去十年中,函数式编程的日益流行导致列表被添加到主要的命令式语言中,例如 Python 和 C#。
The two most common structured (nonscalar) data types in the imperative languages are arrays and records, although the popularity of associative arrays has increased significantly in recent years. Lists have been a central part of functional programming languages since the first such language appeared in 1959 (Lisp). Over the last decade, the increasing popularity of functional programming has led to lists being added to primarily imperative languages, such as Python and C#.
结构化数据类型使用类型运算符或构造函数来定义,用于形成类型表达式。例如,C 使用括号和星号作为类型运算符来指定数组和指针。
The structured data types are defined with type operators, or constructors, which are used to form type expressions. For example, C uses brackets and asterisks as type operators to specify arrays and pointers.
从逻辑和具体的角度考虑变量,用描述符来思考都很方便。描述符是变量属性的集合。在实现中,描述符是存储变量属性的内存区域。如果所有属性都是静态的,则仅在编译时才需要描述符。这些描述符由编译器构建,通常作为符号表的一部分,并在编译期间使用。但是,对于动态属性,必须在执行期间维护部分或全部描述符。在这种情况下,描述符由运行时系统使用。在所有情况下,描述符都用于类型检查和构建分配和释放操作的代码。
It is convenient, both logically and concretely, to think of variables in terms of descriptors. A descriptor is the collection of the attributes of a variable. In an implementation, a descriptor is an area of memory that stores the attributes of a variable. If the attributes are all static, descriptors are required only at compile time. These descriptors are built by the compiler, usually as a part of the symbol table, and are used during compilation. For dynamic attributes, however, part or all of the descriptor must be maintained during execution. In this case, the descriptor is used by the run-time system. In all cases, descriptors are used for type checking and building the code for the allocation and deallocation operations.
使用术语“变量”时必须小心谨慎。只使用传统命令式语言的人可能会将标识符视为变量,但这会在考虑数据类型时导致混淆。在某些编程语言中,标识符没有数据类型。记住标识符只是变量的属性之一,这是明智的做法。
Care must be taken when using the term variable. One who uses only traditional imperative languages may think of identifiers as variables, but that can lead to confusion when considering data types. Identifiers do not have data types in some programming languages. It is wise to remember that identifiers are just one of the attributes of a variable.
对象这个词通常与变量的值及其占用的空间有关。然而,在本书中,我们专门为用户定义和语言定义的抽象数据类型的实例保留对象,而不是为所有预定义类型的程序变量的值。对象将在第 11 章和第12 章中详细讨论。
The word object is often associated with the value of a variable and the space it occupies. In this book, however, we reserve object exclusively for instances of user-defined and language-defined abstract data types, rather than for the values of all program variables of predefined types. Objects are discussed in detail in Chapters 11 and 12.
在以下各节中,我们将讨论许多常见的数据类型。对于大多数数据类型,我们将陈述该类型特有的设计问题。对于所有数据类型,我们将描述一个或多个示例设计。有一个设计问题对于所有数据类型都是根本性的:为该类型的变量提供了哪些操作,以及如何指定这些操作?
In the following sections, many common data types are discussed. For most, design issues particular to the type are stated. For all, one or more example designs are described. One design issue is fundamental to all data types: What operations are provided for variables of the type, and how are they specified?
未根据其他类型定义的数据类型称为原始数据类型。几乎所有编程语言都提供一组原始数据类型。一些原始类型仅仅是硬件的反映 - 例如大多数整数类型。其他类型只需要一点非硬件支持即可实现。
Data types that are not defined in terms of other types are called primitive data types. Nearly all programming languages provide a set of primitive data types. Some of the primitive types are merely reflections of the hardware—for example, most integer types. Others require only a little nonhardware support for their implementation.
为了指定结构化类型,需要使用语言的原始数据类型以及一个或多个类型构造函数。
To specify the structured types, the primitive data types of a language are used, along with one or more type constructors.
一些早期的编程语言只有数字原始类型。数字类型在当代语言支持的类型集合中仍然发挥着核心作用。
Some early programming languages only had numeric primitive types. Numeric types still play a central role among the collections of types supported by contemporary languages.
最常见的原始数字数据类型是integer。许多计算机的硬件支持几种大小的整数。一些编程语言支持这些大小的整数,通常还有一些其他大小的整数。例如,Java 包括四种有符号整数大小:byte、short、int和long。某些语言(例如 C++ 和 C#)包括无符号整数类型,即没有符号的整数值类型。无符号类型通常用于二进制数据。
The most common primitive numeric data type is integer. The hardware of many computers supports several sizes of integers. These sizes of integers, and often a few others, are supported by some programming languages. For example, Java includes four signed integer sizes: byte, short, int, and long. Some languages, for example, C++ and C#, include unsigned integer types, which are types for integer values without signs. Unsigned types are often used for binary data.
有符号整数值在计算机中由一串位表示,其中一位(通常是最左边的位)表示符号。大多数整数类型都由硬件直接支持。硬件不直接支持的整数类型的一个例子是 Python 的长整数类型(F# 也提供此类整数)。此类型的值可以具有无限长度。长整数值可以指定为文字,如下例所示:
A signed integer value is represented in a computer by a string of bits, with one of the bits (typically the leftmost) representing the sign. Most integer types are supported directly by the hardware. One example of an integer type that is not supported directly by the hardware is the long integer type of Python (F# also provides such integers). Values of this type can have unlimited length. Long integer values can be specified as literals, as in the following example:
243725839182756281923L243725839182756281923L
Python 中的整数算术运算产生的值太大而无法用类型表示,int将它们存储为长整数类型值。
Integer arithmetic operations in Python that produce values too large to be represented with int type store them as long integer type values.
负整数可以以符号-数值表示法存储,其中符号位设置为指示负数,而位串的其余部分表示数字的绝对值。但是,符号-数值表示法并不适用于计算机算术。现在大多数计算机使用一种称为二进制补码的表示法来存储负整数,这对于加法和减法很方便。在二进制补码表示法中,负整数的表示方法是取数的正数的逻辑补码并加一。一些计算机仍在使用二进制补码表示法。在二进制补码表示法中,整数的负数存储为其绝对值的逻辑补码。二进制补码表示法的缺点是它有两种零的表示形式。有关整数表示的详细信息,请参阅任何一本关于汇编语言编程的书籍。
A negative integer could be stored in sign-magnitude notation, in which the sign bit is set to indicate negative and the remainder of the bit string represents the absolute value of the number. Sign-magnitude notation, however, does not lend itself to computer arithmetic. Most computers now use a notation called twos complement to store negative integers, which is convenient for addition and subtraction. In twos-complement notation, the representation of a negative integer is formed by taking the logical complement of the positive version of the number and adding one. Ones-complement notation is still used by some computers. In ones-complement notation, the negative of an integer is stored as the logical complement of its absolute value. Ones-complement notation has the disadvantage that it has two representations of zero. See any book on assembly language programming for details of integer representations.
浮点数据类型模拟实数,但其表示形式只是许多实数值的近似值。例如,基本数字 或e(自然对数的底数)可以用浮点表示法正确表示。当然,这两个数字都无法在任何有限的计算机内存中精确表示。在大多数计算机上,浮点数以二进制存储,这使问题更加严重。例如,即使是十进制中的值 0.1 也不能用有限数量的二进制数字表示。1 浮点类型的另一个问题是算术运算会损失精度。有关浮点表示法问题的更多信息,请参阅任何一本关于数值分析的书籍。
Floating-point data types model real numbers, but the representations are only approximations for many real values. For example, neither of the fundamental numbers or e (the base for the natural logarithms) can be correctly represented in floating-point notation. Of course, neither of these numbers can be precisely represented in any finite amount of computer memory. On most computers, floating-point numbers are stored in binary, which exacerbates the problem. For example, even the value 0.1 in decimal cannot be represented by a finite number of binary digits.1 Another problem with floating-point types is the loss of accuracy through arithmetic operations. For more information on the problems of floating-point notation, see any book on numerical analysis.
浮点值用分数和指数表示,这种形式是从科学计数法中借用的。老式计算机使用多种不同的浮点值表示法。但大多数新型计算机使用 IEEE 浮点标准 754 格式。语言实现者使用硬件支持的任何表示法。大多数语言包含两种浮点类型,通常称为float和double。 float 类型是标准大小,通常存储在四个字节的内存中。 double 类型用于需要更大的小数部分和/或更大范围的指数的情况。双精度变量通常占用的存储空间是 float 变量的两倍,并提供至少两倍的小数位数。
Floating-point values are represented as fractions and exponents, a form that is borrowed from scientific notation. Older computers used a variety of different representations for floating-point values. However, most newer machines use the IEEE Floating-Point Standard 754 format. Language implementors use whatever representation is supported by the hardware. Most languages include two floating-point types, often called float and double. The float type is the standard size, usually stored in four bytes of memory. The double type is provided for situations where larger fractional parts and/or a larger range of exponents is needed. Double-precision variables usually occupy twice as much storage as float variables and provide at least twice the number of bits of fraction.
浮点类型可以表示的值的集合是根据精度和范围来定义的。精度是数值小数部分的精确度,以位数来衡量。范围是小数范围和更重要的指数范围的组合。
The collection of values that can be represented by a floating-point type is defined in terms of precision and range. Precision is the accuracy of the fractional part of a value, measured as the number of bits. Range is a combination of the range of fractions and, more important, the range of exponents.
图 6.1显示了 IEEE 浮点标准 754 的单精度和双精度表示格式( IEEE,1985)。IEEE 格式的详细信息可参见Tanenbaum (2005)。
Figure 6.1 shows the IEEE Floating-Point Standard 754 format for single- and double-precision representation (IEEE, 1985). Details of the IEEE formats can be found in Tanenbaum (2005).
某些编程语言支持复杂数据类型,例如 Fortran 和 Python。复杂值表示为浮点值的有序对。在 Python 中,复杂文字的虚部通过在其后跟一个j或来指定J,例如,
Some programming languages support a complex data type—for example, Fortran and Python. Complex values are represented as ordered pairs of floating-point values. In Python, the imaginary part of a complex literal is specified by following it with a j or J—for example,
(7 + 3j)(7 + 3j)
支持复杂类型的语言包括对复杂值的算术运算。
Languages that support a complex type include operations for arithmetic on complex values.
大多数设计用于支持业务系统应用程序的大型计算机都具有对十进制数据类型的硬件支持。十进制数据类型存储固定数量的十进制数字,隐含的小数点位于值中的固定位置。这些是业务数据处理的主要数据类型,因此对 COBOL 至关重要。C# 和 F# 也有十进制数据类型。
Most larger computers that are designed to support business systems applications have hardware support for decimal data types. Decimal data types store a fixed number of decimal digits, with the implied decimal point at a fixed position in the value. These are the primary data types for business data processing and are therefore essential to COBOL. C# and F# also have decimal data types.
十进制类型的优点是能够精确存储十进制值,至少是那些在有限范围内的十进制值,而浮点数则无法做到这一点。例如,数字 0.1(十进制)可以精确表示为十进制数,但不能表示为浮点数,如第6.2.1.2节 所述。十进制类型的缺点是值的范围受到限制,因为不允许使用指数,并且它们在内存中的表示有点浪费,原因将在下一段中讨论。
Decimal types have the advantage of being able to precisely store decimal values, at least those within a restricted range, which cannot be done with floating-point. For example, the number 0.1 (in decimal) can be exactly represented in a decimal type, but not in a floating-point type, as is noted in Section 6.2.1.2. The disadvantages of decimal types are that the range of values is restricted because no exponents are allowed, and their representation in memory is mildly wasteful, for reasons discussed in the following paragraph.
十进制类型的存储方式与字符串非常相似,使用二进制代码表示十进制数字。这些表示形式称为二进制编码的十进制 (BCD)。在某些情况下,它们每字节存储一位数字,但在其他情况下,它们每字节打包两位数字。无论哪种方式,它们都比二进制表示占用更多的存储空间。编码一位十进制数字至少需要四位。因此,存储六位编码的十进制数需要 24 位内存。但是,用二进制存储同一个数只需要 20 位。2对于有硬件支持的机器,十进制值的运算是在硬件中完成的;否则,则在软件中模拟。
Decimal types are stored very much like character strings, using binary codes for the decimal digits. These representations are called binary coded decimal (BCD). In some cases, they are stored one digit per byte, but in others, they are packed two digits per byte. Either way, they take more storage than binary representations. It takes at least four bits to code a decimal digit. Therefore, to store a six-digit coded decimal number requires 24 bits of memory. However, it takes only 20 bits to store the same number in binary.2 The operations on decimal values are done in hardware on machines that have such capabilities; otherwise, they are simulated in software.
布尔类型可能是所有类型中最简单的。它们的值范围只有两个元素:一个表示真,一个表示假。它们是在 ALGOL 60 中引入的,并且自 1960 年以来被纳入大多数通用语言中。一个常见的例外是 C89,其中数值表达式用作条件。在这种表达式中,所有具有非零值的操作数都被视为真,而零则被视为假。虽然 C99 和 C++ 具有布尔类型,但它们也允许将数值表达式用作布尔值。但在后续语言 Java 和 C# 中情况并非如此。
Boolean types are perhaps the simplest of all types. Their range of values has only two elements: one for true and one for false. They were introduced in ALGOL 60 and have been included in most general-purpose languages designed since 1960. One popular exception is C89, in which numeric expressions are used as conditionals. In such expressions, all operands with nonzero values are considered true, and zero is considered false. Although C99 and C++ have a Boolean type, they also allow numeric expressions to be used as if they were Boolean. This is not the case in the subsequent languages, Java and C#.
布尔类型通常用于表示程序中的开关或标志。虽然其他类型(例如整数)也可用于这些目的,但使用布尔类型的可读性更高。
Boolean types are often used to represent switches or flags in programs. Although other types, such as integers, can be used for these purposes, the use of Boolean types is more readable.
布尔值可以用一个位来表示,但是由于许多机器上无法有效访问单个内存位,因此它们通常存储在最小的有效可寻址内存单元中,通常是一个字节。
A Boolean value could be represented by a single bit, but because a single bit of memory cannot be accessed efficiently on many machines, they are often stored in the smallest efficiently addressable cell of memory, typically a byte.
字符数据以数字编码的形式存储在计算机中。传统上,最常用的编码是 8 位代码 ASCII(美国信息交换标准代码),它使用 0 到 127 的值来编码 128 个不同的字符。ISO 8859-1 是另一种 8 位字符代码,但它允许 256 个不同的字符。
Character data are stored in computers as numeric codings. Traditionally, the most commonly used coding was the 8-bit code ASCII (American Standard Code for Information Interchange), which uses the values 0 to 127 to code 128 different characters. ISO 8859-1 is another 8-bit character code, but it allows 256 different characters.
由于业务全球化以及计算机需要与世界各地的其他计算机进行通信,ASCII 字符集变得不够用。为此,Unicode 联盟于 1991 年发布了 UCS-2 标准,即 16 位字符集。此字符代码通常称为 Unicode。Unicode 包括世界上大多数自然语言的字符。例如,Unicode 包括塞尔维亚使用的西里尔字母和泰语数字。Unicode 的前 128 个字符与 ASCII 的字符相同。Java 是第一个使用 Unicode 字符集的广泛使用的语言。从那时起,它就进入了 JavaScript、Python、Perl、C#、F# 和 Swift。
Because of the globalization of business and the need for computers to communicate with other computers around the world, the ASCII character set became inadequate. In response, in 1991, the Unicode Consortium published the UCS-2 standard, a 16-bit character set. This character code is often called Unicode. Unicode includes the characters from most of the world’s natural languages. For example, Unicode includes the Cyrillic alphabet, as used in Serbia, and the Thai digits. The first 128 characters of Unicode are identical to those of ASCII. Java was the first widely used language to use the Unicode character set. Since then, it has found its way into JavaScript, Python, Perl, C#, F#, and Swift.
1991 年后,Unicode 联盟与国际标准化组织 (ISO) 合作开发了一种 4 字节字符代码,称为 UCS-4 或 UTF-32,该代码在 2000 年发布的 ISO/IEC 10646 标准中进行了描述。
After 1991, the Unicode Consortium, in cooperation with the International Standards Organization (ISO), developed a 4-byte character code named UCS-4, or UTF-32, which is described in the ISO/IEC 10646 Standard, published in 2000.
为了提供处理单个字符编码的方法,大多数编程语言都为其提供了一个原始类型。但是,Python 仅支持长度为 1 的字符串作为单个字符。
To provide the means of processing codings of single characters, most programming languages include a primitive type for them. However, Python supports single characters only as character strings of length 1.
字符串类型是指其值由字符序列组成的类型。字符串常量用于标记输出,各种数据的输入和输出通常以字符串的形式进行。当然,字符串也是所有进行字符操作的程序的基本类型。
A character string type is one in which the values consist of sequences of characters. Character string constants are used to label output, and the input and output of all kinds of data are often done in terms of strings. Of course, character strings also are an essential type for all programs that do character manipulation.
特定于字符串类型的两个最重要的设计问题如下:
The two most important design issues that are specific to character string types are the following:
字符串应该是特殊类型的字符数组还是原始类型?
Should strings be a special kind of character array or a primitive type?
字符串应该具有静态长度还是动态长度?
Should strings have static or dynamic length?
最常见的字符串操作是赋值、连接、子字符串引用、比较和模式匹配。
The most common string operations are assignment, catenation, substring reference, comparison, and pattern matching.
子字符串引用是对给定字符串的子字符串的引用。子字符串引用在更一般的数组上下文中讨论,其中子字符串引用称为切片。
A substring reference is a reference to a substring of a given string. Substring references are discussed in the more general context of arrays, where the substring references are called slices.
一般而言,由于字符串操作数的长度可能不同,因此字符串的赋值和比较操作都很复杂。例如,当较长的字符串被赋值给较短的字符串,或反之亦然时,会发生什么情况?通常,针对这些情况会做出简单而明智的选择,尽管程序员通常很难记住它们。
In general, both assignment and comparison operations on character strings are complicated by the possibility of string operands of different lengths. For example, what happens when a longer string is assigned to a shorter string, or vice versa? Usually, simple and sensible choices are made for these situations, although programmers often have trouble remembering them.
在某些语言中,模式匹配是语言中直接支持的。在其他语言中,它由函数或类库提供。
In some languages, pattern matching is supported directly in the language. In others, it is provided by a function or class library.
如果字符串未定义为基本类型,则字符串数据通常存储在单个字符的数组中,并在语言中以此方式引用。这是 C 和 C++ 采用的方法,它们使用char数组来存储字符串。这些语言通过标准库提供了一组字符串操作。许多字符串用户和许多库函数都使用这样的约定:字符串以特殊字符 null 结尾,用零表示。这是保持字符串变量长度的替代方法。库操作只是执行其操作,直到操作的字符串中出现空字符。生成字符串的库函数通常提供空字符。编译器构建的字符串文字也具有空字符。例如,考虑以下声明:
If strings are not defined as a primitive type, string data is usually stored in arrays of single characters and referenced as such in the language. This is the approach taken by C and C++, which use char arrays to store character strings. These languages provide a collection of string operations through standard libraries. Many users of strings and many of the library functions use the convention that character strings are terminated with a special character, null, which is represented with zero. This is an alternative to maintaining the length of string variables. The library operations simply carry out their operations until the null character appears in the string being operated on. Library functions that produce strings often supply the null character. The character string literals that are built by the compiler also have the null character. For example, consider the following declaration:
char str[] = "apples";char str[] = "apples";
在此示例中,str表示元素数组char,具体而言apples0, 其中0是空字符。
In this example, str represents an array of char elements, specifically apples0, where 0 is the null character.
C 和 C++ 中最常用的字符串库函数包括strcpy,用于移动字符串;strcat,用于将一个给定字符串连接到另一个字符串;strcmp,用于按字典顺序(按字符代码的顺序)比较两个给定字符串; 和strlen,用于返回给定字符串中的字符数(不包括空字符)。大多数字符串操作函数的参数和返回值都是char指向 数组的指针char。参数也可以是字符串文字。
Some of the most commonly used library functions for character strings in C and C++ are strcpy, which moves strings; strcat, which catenates one given string onto another; strcmp, which lexicographically compares (by the order of their character codes) two given strings; and strlen, which returns the number of characters, not counting the null character, in the given string. The parameters and return values for most of the string manipulation functions are char pointers that point to arrays of char. Parameters can also be string literals.
C 标准库中的字符串操作函数(在 C++ 中也可用)本质上是不安全的,并且导致了许多编程错误。问题是此库中移动字符串数据的函数没有防止目标溢出。例如,考虑以下对 的调用strcpy:
The string manipulation functions of the C standard library, which are also available in C++, are inherently unsafe and have led to numerous programming errors. The problem is that the functions in this library that move string data do not guard against overflowing the destination. For example, consider the following call to strcpy:
strcpy(dest, src);strcpy(dest, src);
如果 的长度dest为 20, 的长度src为 50,strcpy将会覆盖 后面的 30 个字节dest。问题是strcpy不知道 的长度dest,因此无法确保其后面的内存不会被覆盖。C 字符串库中的其他几个函数也会出现同样的问题。除了 C 风格的字符串之外,C++ 还通过其标准类库支持字符串,这也与 Java 的类似。由于 C 字符串库的不安全性,C++ 程序员应该使用标准库string中的类,而不是char数组和 C 字符串库。
If the length of dest is 20 and the length of src is 50, strcpy will write over the 30 bytes that follow dest. The point is that strcpy does not know the length of dest, so it cannot ensure that the memory following it will not be overwritten. The same problem can occur with several of the other functions in the C string library. In addition to C-style strings, C++ also supports strings through its standard class library, which is also similar to that of Java. Because of the insecurities of the C string library, C++ programmers should use the string class from the standard library, rather than char arrays and the C string library.
在 Java 中,字符串由类支持String,类的值是常量字符串,StringBuffer类的值是可变的,更像单个字符的数组。这些值由StringBuffer类的方法指定。C# 和 Ruby 包含与 Java 类似的字符串类。
In Java, strings are supported by the String class, whose values are constant strings, and the StringBuffer class, whose values are changeable and are more like arrays of single characters. These values are specified with methods of the StringBuffer class. C# and Ruby include string classes that are similar to those of Java.
Python 将字符串作为原始类型,并具有用于子字符串引用、连接、索引以访问单个字符的操作,以及用于搜索和替换的方法。还有一个用于字符串中字符成员的操作。因此,即使 Python 的字符串是原始类型,对于字符和子字符串引用,它们的行为非常像字符数组。但是,Python 字符串是不可变的,类似于StringJava 的类对象。
Python includes strings as a primitive type and has operations for substring reference, catenation, indexing to access individual characters, as well as methods for searching and replacement. There is also an operation for character membership in a string. So, even though Python’s strings are primitive types, for character and substring references, they act very much like arrays of characters. However, Python strings are immutable, similar to the String class objects of Java.
在 F# 中,字符串是一个类。可以访问以 Unicode UTF-16 表示的单个字符,但不能更改。字符串可以用运算符连接起来+。在 ML 中,字符串是原始不可变类型。它使用^连接运算符,并包含用于子字符串引用和获取字符串大小的函数。
In F#, strings are a class. Individual characters, which are represented in Unicode UTF-16, can be accessed, but not changed. Strings can be catenated with the + operator. In ML, string is a primitive immutable type. It uses ^ for its catenation operator and includes functions for substring referencing and getting the size of a string.
在 Swift 中,String类支持其字符串。String对象可以是常量或变量。二元+运算符连接String变量。append方法用于将Character对象添加到String对象。characters方法String用于检查String对象的各个字符。
In Swift, the String class supports its character strings. String objects can be either constants or variables. The binary + operator catenates String variables. The append method is used to add a Character object to a String object. The characters method of String is used to examine individual characters of a String object.
SNOBOL 4 是第一个广为人知的支持模式匹配的语言。
SNOBOL 4 was the first widely known language to support pattern matching.
Perl、JavaScript、Ruby 和 PHP 包含内置的模式匹配操作。在这些语言中,模式匹配表达式在某种程度上松散地基于数学正则表达式。事实上,它们通常被称为正则表达式。它们从早期的 UNIX 行编辑器演变而来,ed成为 UNIX shell 语言的一部分。最终,它们发展成为现在的复杂形式。至少有一本关于这种模式匹配表达式的完整书籍(Friedl,2006)。在本节中,我们通过两个相对简单的例子简要介绍这些表达式的风格。
Perl, JavaScript, Ruby, and PHP include built-in pattern-matching operations. In these languages, the pattern-matching expressions are somewhat loosely based on mathematical regular expressions. In fact, they are often called regular expressions. They evolved from the early UNIX line editor, ed, to become part of the UNIX shell languages. Eventually, they grew to their current complex form. There is at least one complete book on this kind of pattern-matching expressions (Friedl, 2006). In this section, we provide a brief look at the style of these expressions through two relatively simple examples.
考虑以下模式表达式:
Consider the following pattern expression:
/[A-Za-z][A-Za-z\d]+//[A-Za-z][A-Za-z\d]+/
此模式匹配(或描述)编程语言中的典型名称形式。括号内是字符类。第一个字符类指定所有字母;第二个字符类指定所有字母和数字(数字用缩写 指定\d)。如果仅包含第二个字符类,则无法阻止名称以数字开头。第二个类别后面的加号运算符指定必须有一个或多个属于该类别的内容。因此,整个模式匹配以字母开头、后跟一个或多个字母或数字的字符串。
This pattern matches (or describes) the typical name form in programming languages. The brackets enclose character classes. The first character class specifies all letters; the second specifies all letters and digits (a digit is specified with the abbreviation \d). If only the second character class were included, we could not prevent a name from beginning with a digit. The plus operator following the second category specifies that there must be one or more of what is in the category. So, the whole pattern matches strings that begin with a letter, followed by one or more letters or digits.
接下来,考虑以下模式表达式:
Next, consider the following pattern expression:
/\d+\.?\d*|\.\d+//\d+\.?\d*|\.\d+/
此模式匹配数字文字。\.指定文字小数点。3问号量化其后出现的次数,即出现零次或一次。竖线 ( |) 将整个模式中的两个备选方案分开。第一个备选方案匹配一个或多个数字的字符串,后面可能跟着一个小数点,后面跟着零个或多个数字;第二个备选方案匹配以小数点开头、后面跟着一个或多个数字的字符串。
This pattern matches numeric literals. The \. specifies a literal decimal point.3 The question mark quantifies what it follows to have zero or one appearance. The vertical bar (|) separates two alternatives in the whole pattern. The first alternative matches strings of one or more digits, possibly followed by a decimal point, followed by zero or more digits; the second alternative matches strings that begin with a decimal point, followed by one or more digits.
C++、Java、Python、C# 和 F# 的类库中包含使用正则表达式的模式匹配功能。
Pattern-matching capabilities using regular expressions are included in the class libraries of C++, Java, Python, C#, and F#.
关于字符串值的长度,有几种设计选择。首先,长度可以是静态的,并在创建字符串时设置。这样的字符串称为静态长度字符串。这是 Python 字符串的选择,Java 类的不可变对象String,以及 C++ 标准类库、Ruby 内置类String和 C# 和 F# 可用的 .NET 类库中的类似类。
There are several design choices regarding the length of string values. First, the length can be static and set when the string is created. Such a string is called a static length string. This is the choice for the strings of Python, the immutable objects of Java’s String class, as well as similar classes in the C++ standard class library, Ruby’s built-in String class, and the .NET class library available to C# and F#.
第二种选择是允许字符串具有可变的长度,直到达到由变量定义设置的声明和固定的最大值,例如 C 中的字符串和 C++ 的 C 样式字符串。这些被称为有限动态长度字符串。这样的字符串变量可以存储零到最大值之间的任意数量的字符。回想一下,C 中的字符串使用特殊字符来指示字符串字符的结尾,而不是保持字符串长度。
The second option is to allow strings to have varying length up to a declared and fixed maximum set by the variable’s definition, as exemplified by the strings in C and the C-style strings of C++. These are called limited dynamic length strings. Such string variables can store any number of characters between zero and the maximum. Recall that strings in C use a special character to indicate the end of the string’s characters, rather than maintaining the string length.
第三个选项是允许字符串具有可变的长度,没有最大值,就像 JavaScript、Perl 和标准 C++ 库中那样。这些被称为动态长度字符串。此选项需要动态存储分配和释放的开销,但提供了最大的灵活性。
The third option is to allow strings to have varying length with no maximum, as in JavaScript, Perl, and the standard C++ library. These are called dynamic length strings. This option requires the overhead of dynamic storage allocation and deallocation but provides maximum flexibility.
字符串类型对于语言的可写性很重要。将字符串作为数组处理可能比处理原始字符串类型更麻烦。例如,考虑一种将字符串视为字符数组的语言,并且没有预定义的函数来执行strcpyC 中所做的操作。那么,将一个字符串简单地分配给另一个字符串将需要一个循环。将字符串作为原始类型添加到语言中并不会在语言或编译器复杂性方面付出高昂代价。因此,很难证明在某些当代语言中省略原始字符串类型是合理的。当然,通过标准库提供字符串几乎与将它们作为原始类型一样方便。
String types are important to the writability of a language. Dealing with strings as arrays can be more cumbersome than dealing with a primitive string type. For example, consider a language that treats strings as arrays of characters and does not have a predefined function that does what strcpy in C does. Then, a simple assignment of one string to another would require a loop. The addition of strings as a primitive type to a language is not costly in terms of either language or compiler complexity. Therefore, it is difficult to justify the omission of primitive string types in some contemporary languages. Of course, providing strings through a standard library is nearly as convenient as having them as a primitive type.
字符串操作(例如简单的模式匹配和连接)必不可少,应包含在字符串类型值中。虽然动态长度字符串显然是最灵活的,但必须权衡其实现的开销与额外的灵活性。
String operations such as simple pattern matching and catenation are essential and should be included for string type values. Although dynamic length strings are obviously the most flexible, the overhead of their implementation must be weighed against that additional flexibility.
字符串类型可以直接由硬件支持;但在大多数情况下,软件用于实现字符串的存储、检索和操作。当字符串类型表示为字符数组时,语言通常提供很少的操作。
Character string types could be supported directly in hardware; but in most cases, software is used to implement string storage, retrieval, and manipulation. When character string types are represented as character arrays, the language often supplies few operations.
静态字符串类型的描述符仅在编译期间需要,它具有三个字段。每个描述符的第一个字段是类型的名称。对于静态字符串,第二个字段是类型的长度(以字符为单位)。第三个字段是第一个字符的地址。该描述符如图 6.2 所示。 有限动态字符串需要运行时描述符来存储固定的最大长度、当前长度和地址,如图 6.3 所示。 动态长度字符串需要更简单的运行时描述符,因为只需要存储当前长度和地址。尽管我们将描述符描绘为独立的存储块,但在大多数情况下,它们存储在符号表中。
A descriptor for a static character string type, which is required only during compilation, has three fields. The first field of every descriptor is the name of the type. In the case of static character strings, the second field is the type’s length (in characters). The third field is the address of the first character. This descriptor is shown in Figure 6.2. Limited dynamic strings require a run-time descriptor to store the fixed maximum length, the current length, and the address, as shown in Figure 6.3. Dynamic length strings require a simpler run-time descriptor because only the current length and the address need to be stored. Although we depict descriptors as independent blocks of storage, in most cases, they are stored in the symbol table.
C 和 C++ 的有限动态字符串不需要运行时描述符,因为字符串的结尾用空字符标记。它们不需要最大长度,因为在这些语言中,数组引用中的索引值不会进行范围检查。
The limited dynamic strings of C and C++ do not require run-time descriptors, because the end of a string is marked with the null character. They do not need the maximum length, because index values in array references are not range checked in these languages.
静态长度和有限动态长度字符串不需要特殊的动态存储分配。对于有限动态长度字符串,在将字符串变量绑定到存储时会分配足够的最大长度存储空间,因此仅涉及单个分配过程。
Static length and limited dynamic length strings require no special dynamic storage allocation. In the case of limited dynamic length strings, sufficient storage for the maximum length is allocated when the string variable is bound to storage, so only a single allocation process is involved.
动态长度字符串需要更复杂的存储管理。字符串的长度以及它所绑定的存储空间必须动态增长和收缩。
Dynamic length strings require more complex storage management. The length of a string, and therefore the storage to which it is bound, must grow and shrink dynamically.
有三种方法可以支持动态长度字符串所需的动态分配和释放。首先,字符串可以存储在链接列表中,这样当字符串增长时,新需要的单元可以来自堆中的任何位置。这种方法的缺点是列表表示中的链接占用了额外的存储空间,并且字符串操作必然很复杂。
There are three approaches to supporting the dynamic allocation and deallocation that is required for dynamic length strings. First, strings can be stored in a linked list, so that when a string grows, the newly required cells can come from anywhere in the heap. The drawbacks to this method are the extra storage occupied by the links in the list representation and the necessary complexity of string operations.
第二种方法是将字符串存储为指向堆中分配的单个字符的指针数组。此方法仍会使用额外内存,但字符串处理速度比链表方法更快。
The second approach is to store strings as arrays of pointers to individual characters allocated in the heap. This method still uses extra memory, but string processing can be faster than with the linked-list approach.
第三种选择是将完整的字符串存储在相邻的存储单元中。当字符串增长时,这种方法会出现问题:如何继续为字符串变量分配与现有单元相邻的存储空间?通常,这样的存储空间不可用。相反,会找到一个可以存储完整新字符串的新内存区域,并将旧部分移动到该区域。然后,释放用于旧字符串的内存单元。后一种方法是通常使用的方法。管理可变大小段的分配和释放的一般问题将在第6.11.7.3节 中讨论。
The third alternative is to store complete strings in adjacent storage cells. The problem with this method arises when a string grows: How can storage that is adjacent to the existing cells continue to be allocated for the string variable? Frequently, such storage is not available. Instead, a new area of memory is found that can store the complete new string, and the old part is moved to this area. Then, the memory cells used for the old string are deallocated. This latter approach is the one typically used. The general problem of managing allocation and deallocation of variable-size segments is discussed in Section 6.11.7.3.
虽然链表方法需要更多的存储空间,但相关的分配和释放过程很简单。然而,一些字符串操作由于需要指针追踪而变慢。另一方面,使用相邻的内存来存储完整的字符串可以加快字符串操作速度,并且所需的存储空间明显更少,但分配和释放过程会更慢。
Although the linked-list method requires more storage, the associated allocation and deallocation processes are simple. However, some string operations are slowed by the required pointer chasing. On the other hand, using adjacent memory for complete strings results in faster string operations and requires significantly less storage, but the allocation and deallocation processes are slower.
枚举类型是指在定义中提供或枚举所有可能的值(即命名常量)的类型。枚举类型提供了一种定义和分组命名常量集合的方法,这些命名常量称为枚举常量。以下 C# 示例显示了典型枚举类型的定义:
An enumeration type is one in which all of the possible values, which are named constants, are provided, or enumerated, in the definition. Enumeration types provide a way of defining and grouping collections of named constants, which are called enumeration constants. The definition of a typical enumeration type is shown in the following C# example:
enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun};enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun};
枚举常量通常被隐式地分配整数值 0、1……但可以在类型的定义中明确分配任何整数文字。
The enumeration constants are typically implicitly assigned the integer values, 0, 1, . . . but can be explicitly assigned any integer literal in the type’s definition.
枚举类型的设计问题如下:
The design issues for enumeration types are as follows:
枚举常量是否允许出现在多个类型定义中,如果可以,程序中如何检查该常量出现的类型?
Is an enumeration constant allowed to appear in more than one type definition, and if so, how is the type of an occurrence of that constant in the program checked?
枚举值是否强制为整数?
Are enumeration values coerced to integer?
是否有任何其他类型被强制转换为枚举类型?
Are any other types coerced to an enumeration type?
所有这些设计问题都与类型检查有关。如果将枚举变量强制转换为数字类型,则对其合法运算范围或值范围的控制就很少。如果将类型int值强制转换为枚举类型,则可以为枚举类型变量分配任何整数值,无论它是否表示枚举常量。
All of these design issues are related to type checking. If an enumeration variable is coerced to a numeric type, then there is little control over its range of legal operations or its range of values. If an int type value is coerced to an enumeration type, then an enumeration type variable could be assigned any integer value, whether it represented an enumeration constant or not.
在没有枚举类型的语言中,程序员通常用整数值来模拟它们。例如,假设我们需要在 C 程序中表示颜色,而 C 没有枚举类型。我们可能使用 0 表示蓝色,1 表示红色,等等。这些值可以定义如下:
In languages that do not have enumeration types, programmers usually simulate them with integer values. For example, suppose we needed to represent colors in a C program and C did not have an enumeration type. We might use 0 to represent blue, 1 to represent red, and so forth. These values could be defined as follows:
int red = 0, blue = 1;int red = 0, blue = 1;
现在,在程序中,我们可以使用red和,blue就好像它们是颜色类型一样。这种方法的问题在于,由于我们没有为颜色定义类型,因此在使用它们时没有类型检查。例如,将两者相加是合法的,尽管这很少是预期的操作。它们还可以使用任何算术运算符与任何其他数字类型操作数组合,这也很少有用。此外,由于它们只是变量,因此可以给它们分配任何整数值,从而破坏与颜色的关系。可以通过将它们命名为常量来防止后一个问题。
Now, in the program, we could use red and blue as if they were of a color type. The problem with this approach is that because we have not defined a type for our colors, there is no type checking when they are used. For example, it would be legal to add the two together, although that would rarely be an intended operation. They could also be combined with any other numeric type operand using any arithmetic operator, which would also rarely be useful. Furthermore, because they are just variables, they could be assigned any integer value, thereby destroying the relationship with the colors. This latter problem could be prevented by making them named constants.
C 和 Pascal 是第一种广泛使用的包含枚举数据类型的语言。C++ 包含 C 的枚举类型。在 C++ 中,我们可以有以下内容:
C and Pascal were the first widely used languages to include an enumeration data type. C++ includes C’s enumeration types. In C++, we could have the following:
enum colors {red, blue, green, yellow, black};
colors myColor = blue, yourColor = red;enum colors {red, blue, green, yellow, black};
colors myColor = blue, yourColor = red;
该colors类型使用枚举常量的默认内部值 0、1、...,尽管程序员可以将常量专门指定为任何整数文字(或任何常量值表达式)。枚举值在int放入整数上下文时被强制转换为。这允许在任何数字表达式中使用它们。例如,如果的当前值为myColor,blue则表达式
The colors type uses the default internal values for the enumeration constants, 0, 1, . . . , although the constants could have been specifically assigned any integer literal (or any constant-valued expression) by the programmer. The enumeration values are coerced to int when they are put in integer context. This allows their use in any numeric expression. For example, if the current value of myColor is blue, then the expression
myColor++myColor++
green将的整数代码分配给myColor。
would assign the integer code for green to myColor.
C++ 还允许将枚举常量赋值给任何数字类型的变量,尽管这可能是一个错误。但是,在 C++ 中,没有其他类型值被强制转换为枚举类型。例如,
C++ also allows enumeration constants to be assigned to variables of any numeric type, though that would likely be an error. However, no other type value is coerced to an enumeration type in C++. For example,
myColor = 4;myColor = 4;
在 C++ 中是非法的。如果右侧已转换为colors类型,则此赋值将是合法的。这可以避免一些潜在的错误。
is illegal in C++. This assignment would be legal if the right side had been cast to colors type. This prevents some potential errors.
C++枚举常量在同一个引用环境中只能出现在一种枚举类型中。
C++ enumeration constants can appear in only one enumeration type in the same referencing environment.
2004 年,Java 5.0 中加入了枚举类型。Java 中的所有枚举类型都是预定义类的隐式子类Enum。因为枚举类型是类,所以它们可以具有实例数据字段、构造函数和方法。从语法上讲,Java 枚举类型定义与 C++ 的枚举类型定义类似,只是前者可以包含字段、构造函数和方法。枚举的可能值是该类唯一可能的实例。所有枚举类型都继承toString,以及一些其他方法。可以使用静态方法获取枚举类型实例的数组values。可以使用方法获取枚举变量的内部数值ordinal。不能将任何其他类型的表达式分配给枚举变量。此外,枚举变量永远不会强制转换为任何其他类型。
In 2004, an enumeration type was added to Java in Java 5.0. All enumeration types in Java are implicitly subclasses of the predefined class Enum. Because enumeration types are classes, they can have instance data fields, constructors, and methods. Syntactically, Java enumeration type definitions appear like those of C++, except that they can include fields, constructors, and methods. The possible values of an enumeration are the only possible instances of the class. All enumeration types inherit toString, as well as a few other methods. An array of the instances of an enumeration type can be fetched with the static method values. The internal numeric value of an enumeration variable can be fetched with the ordinal method. No expression of any other type can be assigned to an enumeration variable. Also, an enumeration variable is never coerced to any other type.
C# 枚举类型与 C++ 枚举类型类似,不同之处在于它们永远不会被强制转换为整数。因此,枚举类型的操作仅限于有意义的操作。此外,值的范围也限制在特定枚举类型的范围内。
C# enumeration types are like those of C++, except that they are never coerced to integer. So, operations on enumeration types are restricted to those that make sense. Also, the range of values is restricted to that of the particular enumeration type.
在 ML 中,枚举类型被定义为具有datatype声明的新类型。例如,我们可以有以下内容:
In ML, enumeration types are defined as new types with datatype declarations. For example, we could have the following:
datatype weekdays = Monday | Tuesday | Wednesday |
Thursday | Fridaydatatype weekdays = Monday | Tuesday | Wednesday |
Thursday | Friday
元素的类型weekdays是整数。
The type of the elements of weekdays is integer.
F# 具有与 ML 类似的枚举类型,只是type使用保留字代替,datatype并且第一个值前面有一个 OR 运算符 ( |)。
F# has enumeration types that are similar to those of ML, except the reserved word type is used instead of datatype and the first value is preceded by an OR operator (|).
Swift 有一个枚举类型,其中枚举值是名称,代表自身,而不是具有内部整数值。枚举类型在类似于 switch 结构的结构中定义,如下所示:
Swift has an enumeration type in which the enumeration values are names, which represent themselves, rather than having internal integer values. An enumeration type is defined in a structure that is similar to a switch structure, as in:
enum fruit {
case orange
case apple
case banana
}
enum fruit {
case orange
case apple
case banana
}
点符号用于引用枚举值,因此在我们的示例中,的值apple被引用为fruit.apple.
Dot notation is used to reference enumeration values, so in our example, the value of apple is referenced as fruit.apple.
有趣的是,相对较新的脚本语言中没有一种包含枚举类型。这些语言包括 Perl、JavaScript、PHP、Python 和 Ruby。甚至 Java 也已经有十年的历史了,才添加枚举类型。
Interestingly, none of the relatively recent scripting languages include enumeration types. These include Perl, JavaScript, PHP, Python, and Ruby. Even Java was a decade old before enumeration types were added.
枚举类型在可读性和可靠性方面都具有优势。可读性得到非常直接的增强:命名值很容易识别,而编码值则不然。
Enumeration types can provide advantages in both readability and reliability. Readability is enhanced very directly: Named values are easily recognized, whereas coded values are not.
在可靠性方面,C#、F#、Java 5.0 和 Swift 的枚举类型提供了两个优点:(1)枚举类型上没有任何算术运算是合法的;例如,这可以防止添加星期几;(2)其次,任何枚举变量都不能被赋予超出其定义范围的值。4如果枚举colors类型有 10 个枚举常量并将其用作0..9其内部值,则不能将大于的数字9分配给colors类型变量。
In the area of reliability, the enumeration types of C#, F#, Java 5.0, and Swift provide two advantages: (1) No arithmetic operations are legal on enumeration types; this prevents adding days of the week, for example, and (2) second, no enumeration variable can be assigned a value outside its defined range.4 If the colors enumeration type has 10 enumeration constants and uses 0..9 as its internal values, no number greater than 9 can be assigned to a colors type variable.
由于 C 将枚举变量视为整数变量,因此它不提供这两个优点。
Because C treats enumeration variables like integer variables, it does not provide either of these two advantages.
C++ 稍微好一点。只有将数值转换为赋值变量的类型,才能将数值赋给枚举类型变量。将检查赋给枚举类型变量的数值是否在枚举类型的内部值范围内。不幸的是,如果用户使用大量显式赋值,这种检查就无效了。例如,
C++ is a little better. Numeric values can be assigned to enumeration type variables only if they are cast to the type of the assigned variable. Numeric values assigned to enumeration type variables are checked to determine whether they are in the range of the internal values of the enumeration type. Unfortunately, if the user uses a wide range of explicitly assigned values, this checking is not effective. For example,
enum colors {red = 1, blue = 1000, green = 100000}enum colors {red = 1, blue = 1000, green = 100000}
在这个例子中,分配给类型变量的值colors将仅被检查以确定它是否在范围内1..100000。
In this example, a value assigned to a variable of colors type will only be checked to determine whether it is in the range of 1..100000.
数组是数据元素的同质集合,其中单个元素通过其在集合中相对于第一个元素的位置来标识。数组中的各个数据元素属于同一类型。对各个数组元素的引用使用下标表达式指定。如果引用中的任何下标表达式包含变量,则引用将需要额外的运行时计算来确定被引用的内存位置的地址。
An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element. The individual data elements of an array are of the same type. References to individual array elements are specified using subscript expressions. If any of the subscript expressions in a reference include variables, then the reference will require an additional run-time calculation to determine the address of the memory location being referenced.
在许多语言中,例如 C、C++、Java 和 C#,数组的所有元素都必须是同一类型。在这些语言中,指针和引用被限制为指向或引用单一类型。因此,指向或引用的对象或数据值也是单一类型。在某些其他语言中,例如 JavaScript、Python 和 Ruby,变量是对对象或数据值的无类型引用。在这些情况下,数组仍然由单一类型的元素组成,但元素可以引用不同类型的对象或数据值。这样的数组仍然是同质的,因为数组元素属于同一类型。在 Swift 中,数组可以是类型化的,也就是说,它们只包含单一类型的值,也可以是非类型化的,这意味着它们可以包含任何类型的值。
In many languages, such as C, C++, Java, and C#, all of the elements of an array are required to be of the same type. In these languages, pointers and references are restricted to point to or reference a single type. So the objects or data values being pointed to or referenced are also of a single type. In some other languages, such as JavaScript, Python, and Ruby, variables are typeless references to objects or data values. In these cases, arrays still consist of elements of a single type, but the elements can reference objects or data values of different types. Such arrays are still homogeneous, because the array elements are of the same type. In Swift, arrays can be typed, that is, they will contain values only of a single type, or untyped, which means they can contain values of any type.
C# 和 Java 5.0 通过其类库提供通用数组,即元素为对象引用的数组。这些内容将在第 6.5.3节 中讨论。
C# and Java 5.0 provide generic arrays, that is, arrays whose elements are references to objects, through their class libraries. These are discussed in Section 6.5.3.
特定于阵列的主要设计问题如下:
The primary design issues specific to arrays are the following:
哪些类型对于下标来说是合法的?
What types are legal for subscripts?
元素引用范围内的下标表达式是否被检查?
Are subscripting expressions in element references range checked?
下标范围何时受限?
When are subscript ranges bound?
数组分配何时发生?
When does array allocation take place?
是否允许使用不规则或矩形的多维数组,或者两者兼而有之?
Are ragged or rectangular multidimensioned arrays allowed, or both?
当数组分配了存储空间后可以初始化它吗?
Can arrays be initialized when they have their storage allocated?
如果有的话,允许哪些类型的切片?
What kinds of slices are allowed, if any?
在以下章节中,我们将讨论最常见编程语言的数组设计选择的示例。
In the following sections, examples of the design choices made for the arrays of the most common programming languages are discussed.
数组的特定元素通过两级语法机制引用,其中第一部分是聚合名称,第二部分是可能动态的选择器,由一个或多个称为下标或索引的项目组成。 如果引用中的所有下标都是常量,则选择器是静态的;否则,它是动态的。 选择操作可以被认为是从数组名称和下标值集到聚合中元素的映射。 事实上,数组有时被称为有限映射。 从符号上讲,这种映射可以表示为
Specific elements of an array are referenced by means of a two-level syntactic mechanism, where the first part is the aggregate name, and the second part is a possibly dynamic selector consisting of one or more items known as subscripts or indices. If all of the subscripts in a reference are constants, the selector is static; otherwise, it is dynamic. The selection operation can be thought of as a mapping from the array name and the set of subscript values to an element in the aggregate. Indeed, arrays are sometimes called finite mappings. Symbolically, this mapping can be shown as
数组名称(下标值列表) → 元素
array_name(subscript_value_list) → element
1990 年之前的 Fortrans 和 PL/I 的设计者选择用括号作为数组下标,因为当时没有其他合适的字符。打卡机中没有括号字符。
The designers of pre-90 Fortrans and PL/I chose parentheses for array subscripts because no other suitable characters were available at the time. Card punches did not include bracket characters.
数组引用的语法相当通用:数组名后面是下标列表,下标列表由圆括号或方括号括起来。在某些将多维数组作为数组的数组提供的语言中,每个下标都出现在其自己的方括号中。使用圆括号括起下标表达式的一个问题是,它们通常也用于括起子程序调用中的参数;这种用法使对数组的引用看起来与这些调用完全一样。例如,考虑以下 Ada 赋值语句:
The syntax of array references is fairly universal: The array name is followed by the list of subscripts, which is surrounded by either parentheses or brackets. In some languages that provide multidimensioned arrays as arrays of arrays, each subscript appears in its own brackets. A problem with using parentheses to enclose subscript expressions is that they often are also used to enclose the parameters in subprogram calls; this use makes references to arrays appear exactly like those calls. For example, consider the following Ada assignment statement:
Sum := Sum + B(I);Sum := Sum + B(I);
由于 Ada 中的子程序参数和数组下标均使用括号,因此程序阅读器和编译器都被迫使用其他信息来确定B(I)此赋值是函数调用还是对数组元素的引用。这导致可读性降低。
Because parentheses are used for both subprogram parameters and array subscripts in Ada, both program readers and compilers are forced to use other information to determine whether B(I) in this assignment is a function call or a reference to an array element. This results in reduced readability.
Fortran I 将数组下标的数量限制为 3,因为在设计时,执行效率是主要考虑因素。Fortran I 的设计人员开发了一种非常快速的方法来访问最多三维数组的元素,使用 IBM 704 的三个索引寄存器。Fortran IV 最初是在 IBM 7094 上实现的,它有七个索引寄存器。这使得 Fortran IV 的设计人员能够允许数组最多有七个下标。大多数其他当代语言都没有实施这样的限制。
Fortran I limited the number of array subscripts to three, because at the time of the design, execution efficiency was a primary concern. Fortran I designers had developed a very fast method for accessing the elements of arrays of up to three dimensions, using the three index registers of the IBM 704. Fortran IV was first implemented on an IBM 7094, which had seven index registers. This allowed Fortran IV’s designers to allow arrays with up to seven subscripts. Most other contemporary languages enforce no such limits.
Ada 的设计者特意选择用括号括住下标,这样表达式中的数组引用和函数调用之间就可以保持一致,尽管这存在潜在的可读性问题。他们之所以做出这种选择,部分原因是数组元素引用和函数调用都是映射。数组元素引用将下标映射到数组的特定元素。函数调用将实际参数映射到函数定义,并最终映射到函数值。
The designers of Ada specifically chose parentheses to enclose subscripts so there would be uniformity between array references and function calls in expressions, in spite of potential readability problems. They made this choice in part because both array element references and function calls are mappings. Array element references map the subscripts to a particular element of the array. Function calls map the actual parameters to the function definition and, eventually, a functional value.
除 Fortran 和 Ada 之外的大多数语言都使用括号来分隔其数组索引。
Most languages other than Fortran and Ada use brackets to delimit their array indices.
数组类型涉及两种不同的类型:元素类型和下标类型。下标类型通常是整数。
Two distinct types are involved in an array type: the element type and the type of the subscripts. The type of the subscripts is often integer.
早期的编程语言并未指定必须隐式检查下标范围。下标的范围错误在程序中很常见,因此要求进行范围检查是语言可靠性的重要因素。许多当代语言也没有指定下标的范围检查,但 Java、ML 和 C# 有。
Early programming languages did not specify that subscript ranges must be implicitly checked. Range errors in subscripts are common in programs, so requiring range checking is an important factor in the reliability of languages. Many contemporary languages also do not specify range checking of subscripts, but Java, ML, and C# do.
Perl 中的下标有点不寻常,因为尽管所有数组的名称都以符号 ( @) 开头,但由于数组元素始终是标量,并且名称标量总是以美元符号 ( $) 开头,对数组元素的引用在其名称中使用美元符号而不是 at 符号。例如,对于数组@list,第二个元素的引用为$list[1]。
Subscripting in Perl is a bit unusual in that although the names of all arrays begin with at signs (@), because array elements are always scalars and the names of scalars always begin with dollar signs ($), references to array elements use dollar signs rather than at signs in their names. For example, for the array @list, the second element is referenced with $list[1].
在 Perl 中,可以使用负下标引用数组元素,在这种情况下,下标值是距数组末尾的偏移量。例如,如果数组@list有 5 个元素,下标为 0..4,$list
引用下标为 3 的元素。在 Perl 中,引用不存在的元素会产生undef,但不会报告错误。
One can reference an array element in Perl with a negative subscript, in which case the subscript value is an offset from the end of the array. For example, if the array @list has five elements with the subscripts 0..4, $list
references the element with the subscript 3. A reference to a nonexistent element in Perl yields undef, but no error is reported.
下标类型与数组变量的绑定通常是静态的,但下标值范围有时是动态绑定的。
The binding of the subscript type to an array variable is usually static, but the subscript value ranges are sometimes dynamically bound.
在某些语言中,下标范围的下界是隐式的。例如,在基于 C 的语言中,所有下标范围的下界都固定为0。在其他一些语言中,下标范围的下界必须由程序员指定。
In some languages, the lower bound of the subscript range is implicit. For example, in the C-based languages, the lower bound of all subscript ranges is fixed at 0. In some other languages, the lower bounds of the subscript ranges must be specified by the programmer.
根据绑定到下标范围、绑定到存储以及存储分配的位置,数组有四种类别。类别名称表明了这三种设计选择。在前三种类别中,一旦绑定了下标范围并分配了存储,它们就会在变量的整个生命周期内保持不变。当然,当下标范围固定时,数组的大小就无法改变。
There are four categories of arrays, based on the binding to subscript ranges, the binding to storage, and from where the storage is allocated. The category names indicate the design choices of these three. In the first three of these categories, once the subscript ranges are bound and the storage is allocated, they remain fixed for the lifetime of the variable. Of course, when the subscript ranges are fixed, the array cannot change size.
静态数组是下标范围静态绑定且存储分配为静态(在运行前完成)的数组。静态数组的优点是效率高:不需要动态分配或释放。缺点是数组的存储在程序的整个执行时间内是固定的。
A static array is one in which the subscript ranges are statically bound and storage allocation is static (done before run time). The advantage of static arrays is efficiency: No dynamic allocation or deallocation is required. The disadvantage is that the storage for the array is fixed for the entire execution time of the program.
固定堆栈动态数组是下标范围静态绑定的数组,但分配是在执行期间的声明阐述时完成的。固定堆栈动态数组相对于静态数组的优势在于空间效率。一个子程序中的大数组可以使用与另一个子程序中的大数组相同的空间,只要两个子程序不同时处于活动状态即可。如果两个数组位于不同时处于活动状态的不同块中,情况也是如此。缺点是需要分配和释放时间。
A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the allocation is done at declaration elaboration time during execution. The advantage of fixed stack-dynamic arrays over static arrays is space efficiency. A large array in one subprogram can use the same space as a large array in a different subprogram, as long as both subprograms are not active at the same time. The same is true if the two arrays are in different blocks that are not active at the same time. The disadvantage is the required allocation and deallocation time.
固定堆动态数组类似于固定堆栈动态数组,因为在分配存储空间后,下标范围和存储绑定都是固定的。不同之处在于,下标范围和存储绑定都是在用户程序在执行期间请求它们时完成的,并且存储空间是从堆而不是堆栈分配的。固定堆动态数组的优点是灵活性 — 数组的大小始终适合问题。缺点是从堆分配时间比从堆栈分配时间长。
A fixed heap-dynamic array is similar to a fixed stack-dynamic array, in that the subscript ranges and the storage binding are both fixed after storage is allocated. The differences are that both the subscript ranges and storage bindings are done when the user program requests them during execution, and the storage is allocated from the heap, rather than the stack. The advantage of fixed heap-dynamic arrays is flexibility—the array’s size always fits the problem. The disadvantage is allocation time from the heap, which is longer than allocation time from the stack.
堆动态数组是指下标范围和存储分配的绑定是动态的,并且可以在数组的生命周期内更改任意次数。堆动态数组相对于其他数组的优势在于灵活性:数组可以在程序执行过程中随着空间需求的变化而增大和缩小。缺点是分配和释放需要持续时间较长,在程序执行过程中可能会发生多次。以下段落给出了这四种类型的示例。
A heap-dynamic array is one in which the binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime. The advantage of heap-dynamic arrays over the others is flexibility: Arrays can grow and shrink during program execution as the need for space changes. The disadvantage is that allocation and deallocation take longer and may happen many times during execution of the program. Examples of the four categories are given in the following paragraphs.
在 C 和 C++ 函数中声明的包含static修饰符的数组是静态的。
Arrays declared in C and C++ functions that include the static modifier are static.
在 C 和 C++ 函数中不使用说明static符声明的数组是固定堆栈动态数组的示例。
Arrays that are declared in C and C++ functions without the static specifier are examples of fixed stack-dynamic arrays.
C 和 C++ 也提供固定堆动态数组。标准 C 库函数malloc和free分别是通用堆分配和释放操作,可用于 C 数组。C++ 使用运算符new和delete来管理堆存储。数组被视为指向存储单元集合的指针,其中可以对指针进行索引,如第6.11.5节 所述。
C and C++ also provide fixed heap-dynamic arrays. The standard C library functions malloc and free, which are general heap allocation and deallocation operations, respectively, can be used for C arrays. C++ uses the operators new and delete to manage heap storage. An array is treated as a pointer to a collection of storage cells, where the pointer can be indexed, as discussed in Section 6.11.5.
在 Java 中,所有非泛型数组都是固定堆动态的。这些数组一旦创建,就会保持相同的下标范围和存储。C# 也提供固定堆动态数组。
In Java, all non-generic arrays are fixed heap-dynamic. Once created, these arrays keep the same subscript ranges and storage. C# also provides fixed heap-dynamic arrays.
C#List类的对象是通用的堆动态数组。这些数组对象在创建时没有任何元素,如下所示
Objects of the C# List class are generic heap-dynamic arrays. These array objects are created without any elements, as in
List<String> stringList = new List<String>();List<String> stringList = new List<String>();
使用方法将元素添加到此对象中Add,例如
Elements are added to this object with the Add method, as in
stringList.Add("Michael");stringList.Add("Michael");
通过下标可以访问这些数组的元素。
Access to elements of these arrays is through subscripting.
Java 包含一个与 C# 类似的泛型类List,名为ArrayList。它与 C# 的不同之处List在于不支持下标get—set必须使用方法来访问元素。
Java includes a generic class similar to C#’s List, named ArrayList. It is different from C#’s List in that subscripting is not supported—get and set methods must be used to access the elements.
通过使用push(将一个或多个新元素放在数组末尾)和unshift(将一个或多个新元素放在数组开头),或者通过为数组分配一个值来指定超出数组当前最高下标的下标,可以使 Perl 数组增大。通过为数组分配空列表,可以使数组缩小到没有元素。()数组的长度定义为最大下标加一。
A Perl array can be made to grow by using the push (puts one or more new elements on the end of the array) and unshift (puts one or more new elements on the beginning of the array), or by assigning a value to the array specifying a subscript beyond the highest current subscript of the array. An array can be made to shrink to no elements by assigning it the empty list, (). The length of an array is defined to be the largest subscript plus one.
push与 Perl 类似,JavaScript 允许数组使用和方法增大unshift,并通过将其设置为空列表来缩小。但是不支持负下标。
Like Perl, JavaScript allows arrays to grow with the push and unshift methods and shrink by setting them to the empty list. However, negative subscripts are not supported.
JavaScript 数组可以是稀疏的,这意味着下标值不必连续。例如,假设我们有一个名为 list 的数组,它有 10 个元素,下标为 0..9。5考虑以下赋值语句:
JavaScript arrays can be sparse, meaning the subscript values need not be contiguous. For example, suppose we have an array named list that has 10 elements with the subscripts 0..9.5 Consider the following assignment statement:
list[50] = 42;list[50] = 42;
现在,list有 11 个元素,长度为 51。带有下标的元素11..49未定义,因此不需要存储。对 JavaScript 数组中不存在的元素的引用会产生undefined。
Now, list has 11 elements and length 51. The elements with subscripts 11..49 are not defined and therefore do not require storage. A reference to a nonexistent element in a JavaScript array yields undefined.
Python 和 Ruby 中的数组只能通过添加元素或连接其他数组的方法来实现增长。Ruby 和 Perl 支持负下标,但 Python 不支持。在 Python 中,可以删除数组的元素或切片。在 Python 中,对不存在元素的引用会导致运行时错误,而在 Ruby 中,类似的引用则不会nil报告任何错误。
Arrays in Python and Ruby can be made to grow only through methods to add elements or catenate other arrays. Ruby and Perl support negative subscripts, but Python does not. In Python an element or slice of an array can be deleted. A reference to a nonexistent element in Python results in a run-time error, whereas a similar reference in Ruby yields nil and no error is reported.
Swift 动态数组是使用整数下标(从零开始)的对象,并包含几个有用的方法。该append方法将元素添加到数组的末尾。该方法insert在数组的任意位置插入新元素,但如果插入的下标超出数组的当前长度,则会导致错误。可以使用该removeAtIndex方法从数组中删除元素。还有reverse和count方法。
Swift dynamic arrays are objects that use integer subscripts, beginning at zero, and include several useful methods. The append method adds an element to the end of an array. The insert method inserts a new element at any position in the array, but results in an error if the insertion is at a subscript beyond the current length of the array. Elements can be removed from an array with the removeAtIndex method. There are also reverse and count methods.
尽管 ML 定义不包含数组,但其广泛使用的实现 SML/NJ 却包含数组。
Although the ML definition does not include arrays, its widely used implementation, SML/NJ, does.
F# 中唯一预定义的集合类型是数组(其他集合类型通过 .NET Framework 库提供)。这些数组与 C# 中的数组类似。foreach语言中包含一个用于数组处理的语句。
The only predefined collection type that is part of F# is the array (other collection types are provided through the .NET Framework Library). These arrays are like those of C#. A foreach statement is included in the language for array processing.
某些语言提供了在分配存储时初始化数组的方法。C、C++、Java、Swift 和 C# 允许初始化其数组。考虑以下 C 声明:
Some languages provide the means to initialize arrays at the time their storage is allocated. C, C++, Java, Swift, and C# allow initialization of their arrays. Consider the following C declaration:
int list [] = {4, 5, 7, 83};int list [] = {4, 5, 7, 83};
list创建数组并使用值4、5、7和进行初始化83。编译器还会设置数组的长度。这样做是为了方便,但并非没有代价。它有效地消除了系统检测到某些程序员错误的可能性,例如错误地将某个值排除在列表之外。
The array list is created and initialized with the values 4, 5, 7, and 83. The compiler also sets the length of the array. This is meant to be a convenience but is not without cost. It effectively removes the possibility that the system could detect some kinds of programmer errors, such as mistakenly leaving a value out of the list.
如第6.3.2节 所述,C 和 C++ 中的字符串是作为数组实现的char。这些数组可以初始化为字符串常量,如下所示
As discussed in Section 6.3.2, character strings in C and C++ are implemented as arrays of char. These arrays can be initialized to string constants, as in
char name [] = "freddie";char name [] = "freddie";
该数组name将有 8 个元素,因为所有字符串都以空字符(零)结尾,这是系统为字符串常量隐式提供的。
The array name will have eight elements, because all strings are terminated with a null character (zero), which is implicitly supplied by the system for string constants.
C 和 C++ 中的字符串数组也可以用字符串字面量初始化。例如,
Arrays of strings in C and C++ can also be initialized with string literals. For example,
char *names [] = {"Bob", "Jake", "Darcie"};char *names [] = {"Bob", "Jake", "Darcie"};
此示例说明了 C 和 C++ 中字符文字的性质。在上一个字符串文字用于初始化数组的示例中char,name文字被视为char数组。但在后一个示例 ( names) 中,文字被视为指向字符的指针,因此该数组是指向字符的指针数组。例如,是指向文字字符数组中包含字符、、和空字符的字母的names[0]指针。'B''B''o''b'
This example illustrates the nature of character literals in C and C++. In the previous example of a string literal being used to initialize the char array name, the literal is taken to be a char array. But in the latter example (names), the literals are taken to be pointers to characters, so the array is an array of pointers to characters. For example, names[0] is a pointer to the letter 'B' in the literal character array that contains the characters 'B', 'o', 'b', and the null character.
在 Java 中,类似的语法用于定义和初始化对象引用数组String。例如,
In Java, similar syntax is used to define and initialize an array of references to String objects. For example,
String[] names = ["Bob", "Jake", "Darcie"];String[] names = ["Bob", "Jake", "Darcie"];数组操作是对数组作为一个单元进行的操作。最常见的数组操作是赋值、连接、相等和不相等的比较以及切片,这些操作将在6.5.5节 中单独讨论。
An array operation is one that operates on an array as a unit. The most common array operations are assignment, catenation, comparison for equality and inequality, and slices, which are discussed separately in Section 6.5.5.
C 语言不提供任何数组操作,除非通过 Java、C++ 和 C# 的方法。Perl 支持数组赋值,但不支持比较。
The C-based languages do not provide any array operations, except through the methods of Java, C++, and C#. Perl supports array assignments but does not support comparisons.
Python 的数组被称为列表,尽管它们具有动态数组的所有特征。由于对象可以是任何类型,因此这些数组是异构的。Python 提供数组赋值,尽管它只是引用更改。Python 还具有数组连接 ( +) 和元素成员资格 ( in) 的操作。它包括两个不同的比较运算符:一个确定两个变量是否引用同一个对象 ( is),另一个比较引用对象中的所有对应对象,无论它们嵌套的深度如何,以判断是否相等 ( ==)。
Python’s arrays are called lists, although they have all the characteristics of dynamic arrays. Because the objects can be of any types, these arrays are heterogeneous. Python provides array assignment, although it is only a reference change. Python also has operations for array catenation (+) and element membership (in). It includes two different comparison operators: one that determines whether the two variables reference the same object (is) and one that compares all corresponding objects in the referenced objects, regardless of how deeply they are nested, for equality (==).
与 Python 一样,Ruby 数组的元素是对象的引用。与 Python 一样,当==在两个数组之间使用运算符时,只有当两个数组的长度相同且相应元素相等时,结果才为真。Ruby 的数组可以用方法连接起来Array。
Like Python, the elements of Ruby’s arrays are references to objects. And like Python, when a == operator is used between two arrays, the result is true only if the two arrays have the same length and the corresponding elements are equal. Ruby’s arrays can be catenated with an Array method.
F#Array模块中包含许多数组运算符。其中包括Array.append、Array.copy和Array.length。
F# includes many array operators in its Array module. Among these are Array.append, Array.copy, and Array.length.
数组及其操作是 APL 的核心;它是有史以来最强大的数组处理语言。然而,由于它相对晦涩难懂,并且对后续语言影响不大,因此我们在此仅对其数组操作进行简要介绍。
Arrays and their operations are the heart of APL; it is the most powerful array-processing language ever devised. Because of its relative obscurity and its lack of effect on subsequent languages, however, we present here only a glimpse into its array operations.
在 APL 中,为向量(一维数组)和矩阵以及标量操作数定义了四种基本算术运算。例如,
In APL, the four basic arithmetic operations are defined for vectors (single- dimensioned arrays) and matrices, as well as scalar operands. For example,
A + BA + B
是一个有效表达式,无论A和B是标量变量、向量还是矩阵。
is a valid expression, whether A and B are scalar variables, vectors, or matrices.
APL 包含一组用于向量和矩阵的一元运算符,其中一些如下(其中V是向量,M是矩阵):
APL includes a collection of unary operators for vectors and matrices, some of which are as follows (where V is a vector and M is a matrix):
APL 还包括几个以其他运算符为操作数的特殊运算符。其中之一是内积运算符,用句点 ( ) 指定.。它需要两个操作数,它们是二元运算符。例如,
APL also includes several special operators that take other operators as operands. One of these is the inner product operator, which is specified with a period (.). It takes two operands, which are binary operators. For example,
+.×+.×
是一个新的运算符,它接受两个参数,可以是向量或矩阵。它首先将两个参数的相应元素相乘,然后将结果相加。例如,如果A和B是向量,
is a new operator that takes two arguments, either vectors or matrices. It first multiplies the corresponding elements of two arguments, and then it sums the results. For example, if A and B are vectors,
A × BA × B
A是和的数学内积(和B的对应元素的乘积的向量)。语句AB
is the mathematical inner product of A and B (a vector of the products of the corresponding elements of A and B). The statement
A +.× BA +.× B
A是和的内积之和B。如果A和是矩阵,则此表达式指定和B的矩阵乘法。AB
is the sum of the inner product of A and B. If A and B are matrices, this expression specifies the matrix multiplication of A and B.
The special operators of APL are actually functional forms, which are described in Chapter 15.
矩形数组是一种多维数组,其中所有行都具有相同数量的元素,所有列都具有相同数量的元素。矩形数组完全模仿矩形表。
A rectangular array is a multidimensioned array in which all of the rows have the same number of elements and all of the columns have the same number of elements. Rectangular arrays model rectangular tables exactly.
交错数组是指行的长度不必相同的数组。例如,交错矩阵可能由三行组成,一行有 5 个元素,一行有 7 个元素,一行有 12 个元素。这也适用于列和更高维度。因此,如果有第三个维度(层),则每层可以有不同数量的元素。当多维数组实际上是数组的数组时,就可以实现交错数组。例如,矩阵将显示为一维数组的数组。
A jagged array is one in which the lengths of the rows need not be the same. For example, a jagged matrix may consist of three rows, one with 5 elements, one with 7 elements, and one with 12 elements. This also applies to the columns and higher dimensions. So, if there is a third dimension (layers), each layer can have a different number of elements. Jagged arrays are made possible when multidimensioned arrays are actually arrays of arrays. For example, a matrix would appear as an array of single-dimensioned arrays.
C、C++ 和 Java 支持交错数组,但不支持矩形数组。在这些语言中,对多维数组元素的引用对每个维度使用一对单独的括号。例如,
C, C++, and Java support jagged arrays but not rectangular arrays. In those languages, a reference to an element of a multidimensioned array uses a separate pair of brackets for each dimension. For example,
myArray[3][7]myArray[3][7]
C# 和 F# 支持矩形和锯齿状数组。对于矩形数组,引用元素的所有下标表达式都放在一对括号中。例如,
C# and F# support both rectangular and jagged arrays. For rectangular arrays, all subscript expressions in references to elements are placed in a single pair of brackets. For example,
myArray[3, 7]myArray[3, 7]数组的切片是该数组的某个子结构。例如,如果是矩阵,则的第一行是一个可能的切片,最后一行和第一列也是如此。重要的是要意识到切片不是一种新的数据类型。相反,它是一种将数组的一部分作为单位引用的机制。如果数组不能在语言中作为单位进行操作,那么该语言就不需要切片。AA
A slice of an array is some substructure of that array. For example, if A is a matrix, then the first row of A is one possible slice, as are the last row and the first column. It is important to realize that a slice is not a new data type. Rather, it is a mechanism for referencing part of an array as a unit. If arrays cannot be manipulated as units in a language, that language has no use for slices.
考虑以下 Python 声明:
Consider the following Python declarations:
vector = [2, 4, 6, 8, 10, 12, 14, 16]
mat = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
vector = [2, 4, 6, 8, 10, 12, 14, 16]
mat = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
回想一下,Python 数组的默认下限是 0。Python 切片引用的语法是一对用冒号分隔的数字表达式。第一个是切片的第一个下标;第二个是切片中最后一个下标之后的第一个下标。因此,vector[3:6]是一个三元素数组,其中第四到第六个元素为(下标为、和 的vector元素)。矩阵的一行只需给出一个下标即可指定。例如,指的是 的第二行;一行的一部分可以使用与一维数组的一部分相同的语法来指定。例如,指的是 mat 第一行的第一个和第二个元素,即。345mat[1]matmat[0][0:2][1, 2]
Recall that the default lower bound for Python arrays is 0. The syntax of a Python slice reference is a pair of numeric expressions separated by a colon. The first is the first subscript of the slice; the second is the first subscript after the last subscript in the slice. Therefore, vector[3:6] is a three-element array with the fourth through sixth elements of vector (those elements with the subscripts 3, 4, and 5). A row of a matrix is specified by giving just one subscript. For example, mat[1] refers to the second row of mat; a part of a row can be specified with the same syntax as a part of a single-dimensioned array. For example, mat[0][0:2] refers to the first and second element of the first row of mat, which is [1, 2].
Python 还支持更复杂的数组切片。例如,vector[0:7:2]引用 的每个其他元素vector,直到但不包括下标为 7 的元素,从下标 开始0,即[2, 6, 10, 14]。
Python also supports more complex slices of arrays. For example, vector[0:7:2] references every other element of vector, up to but not including the element with the subscript 7, starting with the subscript 0, which is [2, 6, 10, 14].
Perl 支持两种形式的切片:特定下标列表或下标范围。例如,
Perl supports slices of two forms, a list of specific subscripts or a range of subscripts. For example,
@list[1..5] = @list2[3, 5, 7, 9, 13];@list[1..5] = @list2[3, 5, 7, 9, 13];
请注意,切片引用使用数组名称而不是标量名称,因为切片是数组(而不是标量)。
Notice that slice references use array names, not scalar names, because slices are arrays (not scalars).
Ruby 通过slice其Array对象的方法支持切片,该方法可以采用三种形式的参数。单个整数表达式参数被解释为下标,在这种情况下slice返回具有给定下标的元素。如果slice给出了两个整数表达式参数,则第一个被解释为起始下标,第二个被解释为切片中的元素数。例如,假设list定义如下:
Ruby supports slices with the slice method of its Array object, which can take three forms of parameters. A single integer expression parameter is interpreted as a subscript, in which case slice returns the element with the given subscript. If slice is given two integer expression parameters, the first is interpreted as a beginning subscript and the second is interpreted as the number of elements in the slice. For example, suppose list is defined as follows:
list = [2, 4, 6, 8, 10]list = [2, 4, 6, 8, 10]
list.slice(2, 2)返回[6, 8]。 的第三个参数形式为slice一个范围,其形式为一个整数表达式、两个句点和第二个整数表达式。使用范围参数,slice返回具有给定下标范围的元素数组。例如,list.slice (1..3)返回[4, 6, 8]。
list.slice(2, 2) returns [6, 8]. The third parameter form for slice is a range, which has the form of an integer expression, two periods, and a second integer expression. With a range parameter, slice returns an array of the element with the given range of subscripts. For example, list.slice (1..3) returns [4, 6, 8].
几乎所有编程语言都包含数组。自从 Fortran I 引入以来,主要的进步就是切片和动态数组。如第6.6节 所述,数组的最新进步是关联数组。
Arrays have been included in virtually all programming languages. The primary advances since their introduction in Fortran I have been slices and dynamic arrays. As discussed in Section 6.6, the latest advances in arrays have been in associative arrays.
实现数组需要比实现基本类型更多的编译时工作量。允许访问数组元素的代码必须在编译时生成。在运行时,必须执行此代码以生成元素地址。没有办法预先计算要通过引用访问的地址,例如
Implementing arrays requires considerably more compile-time effort than does implementing primitive types. The code to allow accessing of array elements must be generated at compile time. At run time, this code must be executed to produce element addresses. There is no way to precompute the address to be accessed by a reference such as
list[k]list[k]
单维数组实现为相邻存储单元的列表。假设数组list定义为具有下标范围下限0。的访问函数list通常采用以下形式
A single-dimensioned array is implemented as a list of adjacent memory cells. Suppose the array list is defined to have a subscript range lower bound of 0. The access function for list is often of the form
地址 (list[k])
地址 (list[0])
k* 元素大小
address (list[k])
address (list[0])
k * element_size
其中加法的第一个操作数是访问函数的常量部分,第二个操作数是变量部分。
where the first operand of the addition is the constant part of the access function, and the second is the variable part.
如果元素类型是静态绑定的,并且数组是静态绑定到存储的,那么常量部分的值可以在运行时之前计算出来。但是,加法和乘法运算必须在运行时完成。
If the element type is statically bound and the array is statically bound to storage, then the value of the constant part can be computed before run time. However, the addition and multiplication operations must be done at run time.
对于任意下限,此访问函数的泛化为
The generalization of this access function for an arbitrary lower bound is
地址 (list[k])
地址(list[lower_bound])
address (list[k])
address (list[lower_bound])
((k
)*元素大小)
((k
) * element_size)
一维数组的编译时描述符可以采用图 6.4所示的格式。描述符包括构造访问函数所需的信息。如果没有对索引范围进行运行时检查,并且所有属性都是静态的,则执行期间只需要访问函数;不需要描述符。如果对索引范围进行运行时检查完成后,这些索引范围可能需要存储在运行时描述符中。如果特定数组类型的下标范围是静态的,则可以将这些范围合并到执行检查的代码中,从而消除对运行时描述符的需要。如果任何描述符条目是动态绑定的,则必须在运行时维护描述符的这些部分。
The compile-time descriptor for single-dimensioned arrays can have the form shown in Figure 6.4. The descriptor includes information required to construct the access function. If run-time checking of index ranges is not done and the attributes are all static, then only the access function is required during execution; no descriptor is needed. If run-time checking of index ranges is done, then those index ranges may need to be stored in a run-time descriptor. If the subscript ranges of a particular array type are static, then the ranges may be incorporated into the code that does the checking, thus eliminating the need for the run-time descriptor. If any of the descriptor entries are dynamically bound, then those parts of the descriptor must be maintained at run time.
真正的多维数组(即那些不是数组的数组的数组)比单维数组更难实现,尽管扩展到更多维度很简单。硬件内存是线性的——只是一个简单的字节序列。因此,具有两个或多个维度的数据类型的值必须映射到单维内存上。有两种方法可以将多维数组映射到一维:行主序和列主序(未在任何广泛使用的语言中使用)。在行主序中,数组中以其下标的下限值作为第一个下标的元素首先存储,然后是第一个下标的第二个值的元素,依此类推。如果数组是矩阵,则按行存储。例如,如果矩阵具有值
True multidimensional arrays, that is, those that are not arrays of arrays, are more complex to implement than single-dimensioned arrays, although the extension to more dimensions is straightforward. Hardware memory is linear—just a simple sequence of bytes. So values of data types that have two or more dimensions must be mapped onto the single-dimensioned memory. There are two ways in which multidimensional arrays can be mapped to one dimension: row major order and column major order (not used in any widely used language). In row major order, the elements of the array that have as their first subscript the lower bound value of that subscript are stored first, followed by the elements of the second value of the first subscript, and so forth. If the array is a matrix, it is stored by rows. For example, if the matrix had the values
3 4 7
6 2 5
1 3 8
3 4 7
6 2 5
1 3 8
它将按行主顺序存储为
it would be stored in row major order as
3, 4, 7, 6, 2, 5, 1, 3, 8
3, 4, 7, 6, 2, 5, 1, 3, 8
多维数组的访问函数是将其基地址和一组索引值映射到由索引值指定的元素在内存中的地址。以行主顺序存储的二维数组的访问函数可以按如下方式开发。通常,元素的地址是结构的基地址加上元素大小乘以结构中该元素之前的元素数。对于以行主顺序排列的矩阵,元素之前的元素数是该元素上方的行数乘以行的大小,再加上该行中元素左侧的元素数。图 6.5对此进行了说明,其中我们假设下标下限均为零。
The access function for a multidimensional array is the mapping of its base address and a set of index values to the address in memory of the element specified by the index values. The access function for two-dimensional arrays stored in row major order can be developed as follows. In general, the address of an element is the base address of the structure plus the element size times the number of elements that precede it in the structure. For a matrix in row major order, the number of elements that precede an element is the number of rows above the element times the size of a row, plus the number of elements to the left of the element in its row. This is illustrated in Figure 6.5, in which we assume that subscript lower bounds are all zero.
[i,j]矩阵中元素的位置[i,j] element in a matrix要获取实际地址值,必须将所需元素之前的元素数乘以元素大小。现在,访问函数可以写成
To get an actual address value, the number of elements that precede the desired element must be multiplied by the element size. Now, the access function can be written as
地点(a[i,j])
地址a[0, 0]
location(a[i,j])
address of a[0, 0]
((((i第行上方的行数) * (行的大小))
((((number of rows above the ith row) * (size of a row))
j(第列左侧元素的数量))*
(number of elements left of the jth column)) *
元素大小)
element size)
i因为第行上面的行数为,第列i左边的元素数为,所以我们有jj
Because the number of rows above the ith row is i and the number of elements to the left of the jth column is j, we have
地点(a[i, j])
地址a[0, 0]
(((i* n)
j)*
location(a[i, j])
address of a[0, 0]
(((i * n)
j) *
元素大小)
element_size)
其中n是每行元素的数量。第一项是常数部分,最后一项是变量部分。
where n is the number of elements per row. The first term is the constant part and the last is the variable part.
对任意下限的推广将产生以下访问函数:
The generalization to arbitrary lower bounds results in the following access function:
地点(a[i, j])
a[row_lb, col_lb] 的地址
location(a[i, j])
address of a[row_lb, col_lb]
(((i
行数 (lb) * n)
(j
col_lb)) * 元素大小
(((i
row_lb) * n)
(j
col_lb)) * element_size
其中 row_lb 是行的下限,col_lb 是列的下限。这可以重新排列为以下形式:
where row_lb is the lower bound of the rows and col_lb is the lower bound of the columns. This can be rearranged to the form
地点(a[i, j])
a[row_lb, col_lb]的地址
location(a[i, j])
address of a[row_lb, col_lb]
(((行数 * n) col_lb) * 元素大小)
(((row_lb * n) col_lb) * element_size)
(((i* n)
j)*元素大小)
(((i * n)
j) * element_size)
其中前两项是常数部分,最后一项是变量部分。这可以相当容易地推广到任意数量的维度。
where the first two terms are the constant part and the last is the variable part. This can be generalized rather easily to an arbitrary number of dimensions.
对于数组的每个维度,访问函数都需要一个加法和一个乘法指令。因此,访问具有多个下标的数组元素的成本很高。多维数组的编译时描述符如图6.6 所示。
For each dimension of an array, one add and one multiply instruction are required for the access function. Therefore, accesses to elements of arrays with several subscripts are costly. The compile-time descriptor for a multidimensional array is shown in Figure 6.6.
关联数组是数据元素的无序集合,这些数据元素由相同数量的值(称为键)索引。对于非关联数组,索引永远不需要存储(因为它们具有规律性)。但是,在关联数组中,用户定义的键必须存储在结构中。因此,关联数组的每个元素实际上都是一对实体,一个键和一个值。我们使用 Perl 的关联数组设计来说明这种数据结构。Python、Ruby 和 Swift 以及 Java、C++、C# 和 F# 的标准类库也直接支持关联数组。
An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. In the case of non-associative arrays, the indices never need to be stored (because of their regularity). In an associative array, however, the user-defined keys must be stored in the structure. So each element of an associative array is in fact a pair of entities, a key and a value. We use Perl’s design of associative arrays to illustrate this data structure. Associative arrays are also supported directly by Python, Ruby, and Swift and by the standard class libraries of Java, C++, C#, and F#.
关联数组的唯一设计问题是其元素的引用形式。
The only design issue that is specific for associative arrays is the form of references to their elements.
在 Perl 中,关联数组称为哈希,因为在实现中,它们的元素使用哈希函数进行存储和检索。Perl 哈希的命名空间是不同的:每个哈希变量名都必须以百分号 ( %) 开头。每个哈希元素由两部分组成:键(字符串)和值(标量(数字、字符串或引用))。可以使用赋值语句将哈希设置为文字值,如下所示
In Perl, associative arrays are called hashes, because in the implementation their elements are stored and retrieved with hash functions. The namespace for Perl hashes is distinct: Every hash variable name must begin with a percent sign (%). Each hash element consists of two parts: a key, which is a string, and a value, which is a scalar (number, string, or reference). Hashes can be set to literal values with the assignment statement, as in
%salaries = ("Gary" => 75000, "Perry" => 57000,
"Mary" => 55750, "Cedric" => 47850);%salaries = ("Gary" => 75000, "Perry" => 57000,
"Mary" => 55750, "Cedric" => 47850);
各个元素值的引用符号与 Perl 数组的符号类似。键值放在括号中,哈希名称由除第一个字符外相同的标量变量名替换。虽然哈希不是标量,但哈希元素的值部分是标量,因此对哈希元素值的引用使用标量名称。回想一下,标量变量名以美元符号 ( $) 开头。因此,将 赋值58850给 元素%salaries并使用键"Perry"将如下所示:
Individual element values are referenced using notation that is similar to that used for Perl arrays. The key value is placed in braces and the hash name is replaced by a scalar variable name that is the same except for the first character. Although hashes are not scalars, the value parts of hash elements are scalars, so references to hash element values use scalar names. Recall that scalar variable names begin with dollar signs ($). So, an assignment of 58850 to the element of %salaries with the key "Perry" would appear as follows:
$salaries{"Perry"} = 58850;$salaries{"Perry"} = 58850;
使用相同的赋值语句形式添加新元素。可以使用运算符从哈希中删除元素delete,如下所示:
A new element is added using the same assignment statement form. An element can be removed from the hash with the delete operator, as in the following:
delete $salaries{"Gary"};delete $salaries{"Gary"};
可以通过将空文字分配给整个哈希来清空它,如下所示:
The entire hash can be emptied by assigning the empty literal to it, as in the following:
@salaries = ();@salaries = ();
Perl 哈希的大小是动态的:当添加元素时,它会增大;当删除元素时,它会缩小;当通过赋值清空它时,它也会缩小。空文字。该exists运算符返回 true 或 false,具体取决于其操作数键是否是哈希中的元素。例如,
The size of a Perl hash is dynamic: It grows when an element is added and shrinks when an element is deleted, and also when it is emptied by assignment of the empty literal. The exists operator returns true or false, depending on whether its operand key is an element in the hash. For example,
if (exists $salaries{"Shelly"}) . . .if (exists $salaries{"Shelly"}) . . .
keys当将运算符应用于哈希时,它将返回哈希的键数组。该运算values符对哈希的值执行相同操作。该each运算符对哈希的元素对进行迭代。
The keys operator, when applied to a hash, returns an array of the keys of the hash. The values operator does the same for the values of the hash. The each operator iterates over the element pairs of a hash.
Python 的关联数组(称为字典)与 Perl 的关联数组类似,不同之处在于其值都是对对象的引用。Ruby 支持的关联数组与 Python 的关联数组类似,不同之处在于键可以是任何对象6而不仅仅是字符串。从 Perl 的哈希(其中键必须是字符串)到 PHP 的数组(其中键可以是整数或字符串),再到 Ruby 的哈希(其中任何类型的对象都可以是键),这是一个渐进的过程。
Python’s associative arrays, which are called dictionaries, are similar to those of Perl, except the values are all references to objects. The associative arrays supported by Ruby are similar to those of Python, except that the keys can be any object,6 rather than just strings. There is a progression from Perl’s hashes, in which the keys must be strings, to PHP’s arrays, in which the keys can be integers or strings, to Ruby’s hashes, in which any type object can be a key.
PHP 的数组既是普通数组,又是关联数组。它们可以视为其中之一。该语言提供了允许索引和散列访问元素的函数。数组可以包含使用简单数字索引创建的元素和使用字符串散列键创建的元素。
PHP’s arrays are both normal arrays and associative arrays. They can be treated as either. The language provides functions that allow both indexed and hashed access to elements. An array can have elements that are created with simple numeric indices and elements that are created with string hash keys.
Swift 的关联数组称为字典。键可以是某一特定类型,但值可以是混合类型,在这种情况下,它们是对象。
Swift’s associative arrays are called dictionaries. The keys can be of one specific type, but the values can be of mixed types, in which case they are objects.
如果需要搜索元素,关联数组比数组好得多,因为用于访问元素的隐式哈希运算非常高效。此外,当要存储的数据是成对的(例如员工姓名和他们的薪水)时,关联数组是理想的选择。另一方面,如果必须处理列表的每个元素,则使用数组会更有效率。
An associative array is much better than an array if searches of the elements are required, because the implicit hashing operation used to access elements is very efficient. Furthermore, associative arrays are ideal when the data to be stored is paired, as with employee names and their salaries. On the other hand, if every element of a list must be processed, it is more efficient to use an array.
Perl 关联数组的实现针对快速查找进行了优化,但当数组增长需要时,它也能提供相对快速的重组。为每个条目计算一个 32 位哈希值,并将其与条目一起存储,尽管关联数组最初只使用哈希值的一小部分。当关联数组必须扩展到超出其初始大小时,无需更改哈希函数;而是使用哈希值的更多位。发生这种情况时,只需移动一半的条目。因此,尽管关联数组的扩展不是免费的,但成本并不像预期的那样高。
The implementation of Perl’s associative arrays is optimized for fast lookups, but it also provides relatively fast reorganization when array growth requires it. A 32-bit hash value is computed for each entry and is stored with the entry, although an associative array initially uses only a small part of the hash value. When an associative array must be expanded beyond its initial size, the hash function need not be changed; rather, more bits of the hash value are used. Only half of the entries must be moved when this happens. So, although expansion of an associative array is not free, it is not as costly as might be expected.
PHP 数组中的元素通过哈希函数放置在内存中。但是,所有元素都按照创建顺序链接在一起。这些链接用于支持通过current和next函数对元素进行迭代访问。
The elements in PHP’s arrays are placed in memory through a hash function. However, all elements are linked together in the order in which they were created. The links are used to support iterative access to elements through the current and next functions.
记录是数据元素的集合,其中各个元素由名称标识并通过从结构开头的偏移量进行访问。
A record is an aggregate of data elements in which the individual elements are identified by names and accessed through offsets from the beginning of the structure.
程序中经常需要对数据集合进行建模,其中各个元素的类型或大小不相同。例如,有关大学生的信息可能包括姓名、学号、平均成绩等。这种集合的数据类型可能使用字符串表示姓名、整数表示学号、浮点数表示平均成绩等。记录就是为满足这种需求而设计的。
There is frequently a need in programs to model a collection of data in which the individual elements are not of the same type or size. For example, information about a college student might include name, student number, grade point average, and so forth. A data type for such a collection might use a character string for the name, an integer for the student number, a floating-point for the grade point average, and so forth. Records are designed for this kind of need.
记录和异构数组可能看起来一样,但事实并非如此。异构数组的元素都是对位于分散位置(通常在堆上)的数据对象的引用。记录的元素可能大小不同,并且位于相邻的内存位置。
It may appear that records and heterogeneous arrays are the same, but that is not the case. The elements of a heterogeneous array are all references to data objects that reside in scattered locations, often on the heap. The elements of a record are of potentially different sizes and reside in adjacent memory locations.
自 20 世纪 60 年代早期 COBOL 引入记录以来,记录已成为所有最流行的编程语言(Fortran 90 年代之前的版本除外)的一部分。在某些支持面向对象编程的语言中,数据类充当记录。
Records have been part of all of the most popular programming languages, except pre-90 versions of Fortran, since the early 1960s, when they were introduced by COBOL. In some languages that support object-oriented programming, data classes serve as records.
在 C、C++、C# 和 Swift 中,记录支持该struct数据类型。在 C++ 中,结构是类的一个小变体。在 C# 中,结构也与类相关,但与类有很大不同。C# 结构是堆栈分配的值类型,而类对象是堆分配的引用类型。C++ 和 C# 中的结构通常用作封装结构,而不是数据结构。第11章 将进一步讨论它们。ML 和 F# 中也包含结构。
In C, C++, C#, and Swift, records are supported with the struct data type. In C++, structures are a minor variation on classes. In C#, structs also are related to classes, but are quite different from them. C# structs are stack-allocated value types, as opposed to class objects, which are heap-allocated reference types. Structs in C++ and C# are normally used as encapsulation structures, rather than data structures. They are further discussed in this capacity in Chapter 11. Structs are also included in ML and F#.
在 Python 和 Ruby 中,记录可以实现为哈希值,哈希值本身可以是数组的元素。
In Python and Ruby, records can be implemented as hashes, which themselves can be elements of arrays.
以下部分描述了如何声明或定义记录、如何引用记录内的字段以及常见的记录操作。
The following sections describe how records are declared or defined, how references to fields within records are made, and the common record operations.
以下设计问题特定于记录:
The following design issues are specific to records:
字段引用的语法形式是什么?
What is the syntactic form of references to fields?
允许省略引用吗?
Are elliptical references allowed?
记录和数组之间的根本区别在于记录元素(或字段)不通过索引引用。相反,字段用标识符命名,并使用这些标识符引用字段。数组和记录之间的另一个区别是,某些语言中的记录允许包含联合,这将在第6.10节 中讨论。
The fundamental difference between a record and an array is that record elements, or fields, are not referenced by indices. Instead, the fields are named with identifiers, and references to the fields are made using these identifiers. Another difference between arrays and records is that records in some languages are allowed to include unions, which are discussed in Section 6.10.
记录声明的 COBOL 形式是 COBOL 程序数据部分的一部分,如下例所示:
The COBOL form of a record declaration, which is part of the data division of a COBOL program, is illustrated in the following example:
01 EMPLOYEE-RECORD.
02 EMPLOYEE-NAME.
05 FIRST PICTURE IS X(20).
05 Middle PICTURE IS X(10).
05 LAST PICTURE IS X(20).
02 HOURLY-RATE PICTURE IS 99V99.01 EMPLOYEE-RECORD.
02 EMPLOYEE-NAME.
05 FIRST PICTURE IS X(20).
05 Middle PICTURE IS X(10).
05 LAST PICTURE IS X(20).
02 HOURLY-RATE PICTURE IS 99V99.
记录EMPLOYEE-RECORD由EMPLOYEE-NAME记录和字段组成。记录声明行开头的HOURLY-RATE数字01、02和是级别编号,它们通过相对值指示记录的层次结构。任何后面跟着更高级别编号的行本身都是一条记录。子句显示了字段存储位置的格式,其中指定 20 个字母数字字符,并指定 4 个十进制数字,小数点在中间。05PICTUREX(20)99V99
The EMPLOYEE-RECORD record consists of the EMPLOYEE-NAME record and the HOURLY-RATE field. The numerals 01, 02, and 05 that begin the lines of the record declaration are level numbers, which indicate by their relative values the hierarchical structure of the record. Any line that is followed by a line with a higher-level number is itself a record. The PICTURE clauses show the formats of the field storage locations, with X(20) specifying 20 alphanumeric characters and 99V99 specifying four decimal digits with the decimal point in the middle.
在 Java 中,记录可以定义为数据类,嵌套记录可以定义为嵌套类,此类的数据成员作为记录字段。
In Java, records can be defined as data classes, with nested records defined as nested classes. Data members of such classes serve as the record fields.
对记录中各个字段的引用在语法上由几种不同的方法指定,其中两种方法命名所需字段及其封闭的记录。 COBOL 字段引用的形式为
References to the individual fields of records are syntactically specified by several different methods, two of which name the desired field and its enclosing records. COBOL field references have the form
字段名称OF记录名称_1记录名称_nOF . . . OF
field_name OF record_name_1 OF . . . OF record_name_n
其中,第一个记录的名称是包含该字段的最小或最内层的记录。序列中的下一个记录名称是包含前一个记录的记录的名称,依此类推。例如,Middle上面 COBOL 记录示例中的字段可以用以下方式引用
where the first record named is the smallest or innermost record that contains the field. The next record name in the sequence is that of the record that contains the previous record, and so forth. For example, the Middle field in the COBOL record example above can be referenced with
Middle OF EMPLOYEE-NAME OF EMPLOYEE-RECORDMiddle OF EMPLOYEE-NAME OF EMPLOYEE-RECORD
大多数其他语言都使用点符号来表示字段引用,其中引用的组件用句点连接。点符号中的名称与 COBOL 引用的顺序相反:它们使用名称最大封闭记录的开头是第一个,字段名称最后是最后一个。例如,如果是记录中嵌入的Middle字段,则将使用以下内容引用它:Employee_NameEmployee_Record
Most of the other languages use dot notation for field references, where the components of the reference are connected with periods. Names in dot notation have the opposite order of COBOL references: They use the name of the largest enclosing record first and the field name last. For example, if Middle is a field in the Employee_Name record which is embedded in the Employee_Record record, it would be referenced with the following:
Employee_Record.Employee_Name.MiddleEmployee_Record.Employee_Name.Middle
对记录字段的完全限定引用是指在引用中命名所有中间记录名称(从最大的封闭记录到特定字段)。在上面的 COBOL 示例中,字段引用是完全限定的。作为完全限定引用的替代,COBOL 允许对记录字段进行省略引用。在省略引用中,字段是有名称的,但可以省略部分或全部封闭记录名称,只要生成的引用在引用环境中无歧义即可。例如,、FIRST和FIRST OF EMPLOYEE-NAME是FIRST OF EMPLOYEE-RECORD对上面声明的 COBOL 记录中员工名字的省略引用。虽然省略引用可以方便程序员,但它们要求编译器具有复杂的数据结构和过程才能正确识别引用的字段。它们对可读性也有一定的损害。
A fully qualified reference to a record field is one in which all intermediate record names, from the largest enclosing record to the specific field, are named in the reference. In the COBOL example above the field reference is fully qualified. As an alternative to fully qualified references, COBOL allows elliptical references to record fields. In an elliptical reference, the field is named, but any or all of the enclosing record names can be omitted, as long as the resulting reference is unambiguous in the referencing environment. For example, FIRST, FIRST OF EMPLOYEE-NAME, and FIRST OF EMPLOYEE-RECORD are elliptical references to the employee’s first name in the COBOL record declared above. Although elliptical references are a programmer convenience, they require a compiler to have elaborate data structures and procedures in order to correctly identify the referenced field. They are also somewhat detrimental to readability.
记录是编程语言中很有价值的数据类型。记录类型的设计很简单,使用也很安全。
Records are frequently valuable data types in programming languages. The design of record types is straightforward, and their use is safe.
记录和数组是紧密相关的结构形式,因此对它们进行比较是很有意思的。当所有数据值都具有相同的类型和/或以相同的方式处理时,使用数组。当有系统的方式对结构进行排序时,这种处理很容易完成。使用动态下标作为寻址方法可以很好地支持这种处理。
Records and arrays are closely related structural forms, and therefore it is interesting to compare them. Arrays are used when all the data values have the same type and/or are processed in the same way. This processing is easily done when there is a systematic way of sequencing through the structure. Such processing is well supported by using dynamic subscripting as the addressing method.
当数据值集合是异构的并且不同字段的处理方式不同时,会使用记录。此外,记录的字段通常不需要按特定顺序处理。字段名称类似于文字或常量下标。由于它们是静态的,因此它们可以非常高效地访问字段。动态下标可用于访问记录字段,但它不允许进行类型检查,并且速度也会更慢。
Records are used when the collection of data values is heterogeneous and the different fields are not processed in the same way. Also, the fields of a record often need not be processed in a particular order. Field names are like literal, or constant, subscripts. Because they are static, they provide very efficient access to the fields. Dynamic subscripts could be used to access record fields, but it would disallow type checking and would also be slower.
记录和数组代表了实现两个独立但相关的数据结构应用的周到而有效的方法。
Records and arrays represent thoughtful and efficient methods of fulfilling two separate but related applications of data structures.
记录的字段存储在相邻的内存位置。但由于字段的大小不一定相同,因此数组使用的访问方法不用于记录。相反,相对于记录开头的偏移地址与每个字段相关联。字段访问都使用这些偏移量。记录的编译时描述符具有图 6.7所示的一般形式。记录的运行时描述符是不必要的。
The fields of records are stored in adjacent memory locations. But because the sizes of the fields are not necessarily the same, the access method used for arrays is not used for records. Instead, the offset address, relative to the beginning of the record, is associated with each field. Field accesses are all handled using these offsets. The compile-time descriptor for a record has the general form shown in Figure 6.7. Run-time descriptors for records are unnecessary.
元组是一种类似于记录的数据类型,只是元素没有命名。
A tuple is a data type that is similar to a record, except that the elements are not named.
Python 包含一个不可变的元组类型。如果需要更改元组,可以使用函数将其转换为数组list。更改后,可以使用函数将其转换回元组tuple。元组的一个用途是当数组必须被写保护时,例如当它作为参数发送到外部函数并且用户不希望该函数能够修改该参数时。
Python includes an immutable tuple type. If a tuple needs to be changed, it can be converted to an array with the list function. After the change, it can be converted back to a tuple with the tuple function. One use of tuples is when an array must be write protected, such as when it is sent as a parameter to an external function and the user does not want the function to be able to modify the parameter.
Python 的元组与列表密切相关,但元组是不可变的。通过分配元组文字可以创建元组,如以下示例所示:
Python’s tuples are closely related to its lists, except that tuples are immutable. A tuple is created by assigning a tuple literal, as in the following example:
myTuple = (3, 5.8, 'apple')myTuple = (3, 5.8, 'apple')
请注意,元组的元素不需要是同一类型。
Notice that the elements of a tuple need not be of the same type.
可以使用括号中的索引来引用元组的元素,如下所示:
The elements of a tuple can be referenced with indexing in brackets, as in the following:
myTuple[1]myTuple[1]
这引用了元组的第一个元素,因为元组索引从 开始1。
This references the first element of the tuple, because tuple indexing begins at 1.
元组可以用加号 ( +) 运算符连接起来。可以用语句删除它们del。还有其他运算符和函数可以对元组进行操作。
Tuples can be catenated with the plus (+) operator. They can be deleted with the del statement. There are also other operators and functions that operate on tuples.
ML 包含元组数据类型。ML 元组必须至少包含两个元素,而 Python 的元组可以为空或包含一个元素。与 Python 一样,ML 元组可以包含混合类型的元素。以下语句创建元组:
ML includes a tuple data type. An ML tuple must have at least two elements, whereas Python’s tuples can be empty or contain one element. As in Python, an ML tuple can include elements of mixed types. The following statement creates a tuple:
val myTuple = (3, 5.8, 'apple');val myTuple = (3, 5.8, 'apple');
元组元素访问的语法如下:
The syntax of a tuple element access is as follows:
#1(myTuple);#1(myTuple);
这引用了元组的第一个元素。
This references the first element of the tuple.
可以使用类型声明在 ML 中定义新的元组类型,如下所示:
A new tuple type can be defined in ML with a type declaration, such as the following:
type intReal = int * real;type intReal = int * real;
此类型的值由一个整数和一个实数组成。星号可能会引起误解。它用于分隔元组组件,表示类型积,与算术无关。
Values of this type consist of an integer and a real. The asterisk can be misleading. It is used to separate the tuple components, indicating a type product, and has nothing to do with arithmetic.
F# 也有元组。元组是通过将元组值(用逗号分隔并用括号定界的表达式列表)分配给语句中的名称来创建的let。如果元组有两个元素,则可以分别使用函数fst和来引用snd它们。具有两个以上元素的元组的元素通常使用语句左侧的元组模式来引用let。元组模式只是一个名称序列,元组的每个元素一个,带或不带定界括号。当元组模式是构造的左侧时let,它是一个多重赋值。例如,考虑以下let构造:
F# also has tuples. A tuple is created by assigning a tuple value, which is a list of expressions separated by commas and delimited by parentheses, to a name in a let statement. If a tuple has two elements, they can be referenced with the functions fst and snd, respectively. The elements of a tuple with more than two elements are often referenced with a tuple pattern on the left side of a let statement. A tuple pattern is simply a sequence of names, one for each element of the tuple, with or without the delimiting parentheses. When a tuple pattern is the left side of a let construct, it is a multiple assignment. For example, consider the following let constructs:
let tup = (3, 5, 7);;
let a, b, c = tup;;
let tup = (3, 5, 7);;
let a, b, c = tup;;
这将分配3给a、5和。b7c
This assigns 3 to a, 5 to b, and 7 to c.
元组在 Python、ML 和 F# 中用于允许函数返回多个值。在 Swift 中,元组按值传递,因此有时当函数不更改数据时,它们用于将数据传递给函数。
Tuples are used in Python, ML, and F# to allow functions to return multiple values. In Swift, tuples are passed by value, so they are sometimes used to pass data to a function when the function is not to change that data.
列表最早在第一个函数式编程语言 Lisp 中得到支持。它们一直是函数式语言的一部分,但近年来它们也进入了一些命令式语言。
Lists were first supported in the first functional programming language, Lisp. They have always been part of the functional languages, but in recent years they have found their way into some imperative languages.
Scheme 和 Common Lisp 中的列表由括号分隔,元素之间不以任何标点符号分隔。例如,
Lists in Scheme and Common Lisp are delimited by parentheses and the elements are not separated by any punctuation. For example,
(A B C D)(A B C D)
嵌套列表具有相同的形式,因此我们可以
Nested lists have the same form, so we could have
(A (B C) D)(A (B C) D)
在这个列表中,(B C)有一个列表嵌套在外部列表内。
In this list, (B C) is a list nested inside the outer list.
在 Lisp 及其后代中,数据和代码具有相同的语法形式。如果将列表解释为代码,则它是对带有参数和 的(A B C)函数的调用。ABC
Data and code have the same syntactic form in Lisp and its descendants. If the list (A B C) is interpreted as code, it is a call to the function A with parameters B and C.
Scheme 中的基本列表操作是两个拆分列表的函数和两个构建列表的函数。该CAR函数返回其列表参数的第一个元素。例如,考虑以下示例:
The fundamental list operations in Scheme are two functions that take lists apart and two that build lists. The CAR function returns the first element of its list parameter. For example, consider the following example:
(CAR '(A B C))(CAR '(A B C))
参数列表前的引号是为了防止解释器将该列表视为对A具有参数和的函数的调用B,C在这种情况下它将对其进行解释。此调用CAR返回A。
The quote before the parameter list is to prevent the interpreter from considering the list a call to the A function with the parameters B and C, in which case it would interpret it. This call to CAR returns A.
该CDR函数返回其参数列表减去其第一个元素。例如,考虑以下示例:
The CDR function returns its parameter list minus its first element. For example, consider the following example:
(CDR '(A B C))(CDR '(A B C))
此函数调用返回列表(B C)。
This function call returns the list (B C).
Common Lisp 还有函数FIRST(与 相同CAR),,SECOND...,,TENTH它们返回由其名称指定的列表参数的元素。
Common Lisp also has the functions FIRST (same as CAR), SECOND, . . . , TENTH, which return the element of their list parameters that is specified by their names.
CONS在 Scheme 和 Common Lisp 中,新列表由和函数构造LIST。该函数CONS接受两个参数并返回一个新列表,其中第一个参数作为第一个元素,第二个参数作为该列表的其余部分。例如,考虑以下内容:
In Scheme and Common Lisp, new lists are constructed with the CONS and LIST functions. The function CONS takes two parameters and returns a new list with its first parameter as the first element and its second parameter as the remainder of that list. For example, consider the following:
(CONS 'A '(B C))(CONS 'A '(B C))
此调用返回新列表(A B C)。
This call returns the new list (A B C).
该LIST函数接受任意数量的参数并返回一个以参数为元素的新列表。例如,考虑以下对 的调用LIST:
The LIST function takes any number of parameters and returns a new list with the parameters as its elements. For example, consider the following call to LIST:
(LIST 'A 'B '(C D))(LIST 'A 'B '(C D))
此调用返回新列表(A B (C D))。
This call returns the new list (A B (C D)).
ML 有列表和列表操作,尽管它们的外观与 Scheme 不同。列表在方括号中指定,元素用逗号分隔,如以下整数列表所示:
ML has lists and list operations, although their appearance is not like those of Scheme. Lists are specified in square brackets, with the elements separated by commas, as in the following list of integers:
[5, 7, 9][5, 7, 9]
[]是空列表,也可以用 来指定nil。
[] is the empty list, which could also be specified with nil.
SchemeCONS函数在 ML 中实现为二元中缀运算符,表示为::。例如,
The Scheme CONS function is implemented as a binary infix operator in ML, represented as ::. For example,
3 :: [5, 7, 9]3 :: [5, 7, 9]
返回以下新列表:[3, 5, 7, 9]。
returns the following new list: [3, 5, 7, 9].
列表中的元素必须是同一类型,因此以下列表是非法的:
The elements of a list must be of the same type, so the following list would be illegal:
[5, 7.3, 9][5, 7.3, 9]
ML 具有与 Scheme 的CAR和相对应的函数CDR,名为hd(head) 和tl(tail)。例如,
ML has functions that correspond to Scheme’s CAR and CDR, named hd (head) and tl (tail). For example,
hd [5, 7, 9] is 5
tl [5, 7, 9] is [7, 9]hd [5, 7, 9] is 5
tl [5, 7, 9] is [7, 9]Scheme 和 ML 中的列表和列表操作将在第15章 中更详细地讨论。
Lists and list operations in Scheme and ML are more fully discussed in Chapter 15.
F# 中的列表与 ML 中的列表相似,但有一些显著差异。F# 中列表的元素用分号分隔,而不是 ML 中的逗号。操作hd和tl相同,但它们被称为类的方法List,如List.hd [1; 3; 5; 7],它返回1。F CONS# 中的操作指定为两个冒号,如 ML 中一样。
Lists in F# are related to those of ML with a few notable differences. Elements of a list in F# are separated by semicolons, rather than the commas of ML. The operations hd and tl are the same, but they are called as methods of the List class, as in List.hd [1; 3; 5; 7], which returns 1. The CONS operation of F# is specified as two colons, as in ML.
Python 包含一个列表数据类型,它也可用作 Python 的数组。与 Scheme、Common Lisp、ML 和 F# 的列表不同,Python 的列表是可变的。它们可以包含任何数据值或对象。Python 列表是通过将列表值分配给名称来创建的。列表值是一系列用逗号分隔并用括号分隔的表达式。例如,考虑以下语句:
Python includes a list data type, which also serves as Python’s arrays. Unlike the lists of Scheme, Common Lisp, ML, and F#, the lists of Python are mutable. They can contain any data value or object. A Python list is created with an assignment of a list value to a name. A list value is a sequence of expressions that are separated by commas and delimited with brackets. For example, consider the following statement:
myList = [3, 5.8, "grape"]myList = [3, 5.8, "grape"]
列表的元素通过括号中的下标引用,如下例所示:
The elements of a list are referenced with subscripts in brackets, as in the following example:
x = myList[1]x = myList[1]
此语句将 赋值5.8给x。列表元素的索引从零开始。列表元素也可以通过赋值来更新。可以使用 删除列表元素del,如以下语句所示:
This statement assigns 5.8 to x. The elements of a list are indexed starting at zero. List elements also can be updated by assignment. A list element can be deleted with del, as in the following statement:
del myList[1]del myList[1]
此语句删除的第二个元素myList。
This statement removes the second element of myList.
Python 包含一个强大的创建数组的机制,称为列表推导式。列表推导式是一种源自集合符号的概念。它最早出现在函数式编程语言 Haskell 中(参见第15章 )。列表推导式的机制是将一个函数应用于给定数组的每个元素,并根据结果构造一个新数组。Python 列表推导式的语法如下:
Python includes a powerful mechanism for creating arrays called list comprehensions. A list comprehension is an idea derived from set notation. It first appeared in the functional programming language Haskell (see Chapter 15). The mechanics of a list comprehension is that a function is applied to each of the elements of a given array and a new array is constructed from the results. The syntax of a Python list comprehension is as follows:
[表达式foriterate_varin数组if条件]
[expression for iterate_var in array if condition]
请考虑以下示例:
Consider the following example:
[x * x for x in range(12) if x % 3 == 0][x * x for x in range(12) if x % 3 == 0]
函数range创建数组[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]。条件过滤掉数组中所有不能被 整除的数字3。然后,表达式对剩余数字进行平方。平方的结果收集在一个数组中,并返回该数组。此列表推导返回以下数组:
The range function creates the array [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. The conditional filters out all numbers in the array that are not evenly divisible by 3. Then, the expression squares the remaining numbers. The results of the squaring are collected in an array, which is returned. This list comprehension returns the following array:
[0, 9, 36, 81][0, 9, 36, 81]
Python 还支持列表切片。Haskell
的列表推导具有以下形式:
Slices of lists are also supported in Python.
Haskell’s list comprehensions have the following form:
[身体|限定符]
[body | qualifiers]
例如,考虑以下列表的定义:
For example, consider the following definition of a list:
[n * n | n <- [1..10]][n * n | n <- [1..10]]
1这定义了从到 的数字的平方列表10。
This defines a list of the squares of the numbers from 1 to 10.
F# 包含列表推导,在该语言中,列表推导也可用于创建数组。例如,考虑以下语句:
F# includes list comprehensions, which in that language can also be used to create arrays. For example, consider the following statement:
let myArray = [|for i in 1 .. 5 -> (i * i) |];;let myArray = [|for i in 1 .. 5 -> (i * i) |];;
该语句创建数组[1; 4; 9; 16; 25]并为其命名myArray。
This statement creates the array [1; 4; 9; 16; 25] and names it myArray.
回想一下第6.5节 ,C# 和 Java 分别支持通用堆动态集合类List和ArrayList。这些结构实际上是列表。
Recall from Section 6.5 that C# and Java support generic heap-dynamic collection classes, List and ArrayList, respectively. These structures are actually lists.
联合是一种类型,其变量可以在程序执行期间的不同时间存储不同类型的值。作为需要联合类型的一个示例,请考虑编译器的常量表,该表用于存储正在编译的程序中找到的常量。每个表条目的一个字段用于常量的值。假设对于正在编译的特定语言,常量的类型为整数、浮点数和布尔值。在表管理方面,如果同一位置(表字段)可以存储这三种类型中的任意一种值,将会很方便。然后可以以相同的方式寻址所有常量值。从某种意义上说,这种位置的类型是它可以存储的三种值类型的联合。
A union is a type whose variables may store different type values at different times during program execution. As an example of the need for a union type, consider a table of constants for a compiler, which is used to store the constants found in a program being compiled. One field of each table entry is for the value of the constant. Suppose that for a particular language being compiled, the types of constants were integer, floating-point, and Boolean. In terms of table management, it would be convenient if the same location, a table field, could store a value of any of these three types. Then all constant values could be addressed in the same way. The type of such a location is, in a sense, the union of the three value types it can store.
第6.12节 讨论的联合类型的类型检查问题是其主要的设计问题。
The problem of type checking union types, which is discussed in Section 6.12, is their major design issue.
C 和 C++ 提供了不支持类型检查的联合构造。在 C 和 C++ 中,该union构造用于指定联合结构。这些语言中的联合被称为自由联合,因为程序员在使用时可以完全不受类型检查的影响。例如,考虑以下 C 联合:
C and C++ provide union constructs in which there is no language support for type checking. In C and C++, the union construct is used to specify union structures. The unions in these languages are called free unions, because programmers are allowed complete freedom from type checking in their use. For example, consider the following C union:
union flexType {
int intEl;
float floatEl;
};
union flexType el1;
float x;
. . .
el1.intEl = 27;
x = el1.floatEl;
union flexType {
int intEl;
float floatEl;
};
union flexType el1;
float x;
. . .
el1.intEl = 27;
x = el1.floatEl;
最后一个赋值没有进行类型检查,因为系统无法确定 的当前值的当前类型el1,所以它将 的位串表示形式分配27给float变量x,这当然是无稽之谈。
This last assignment is not type checked, because the system cannot determine the current type of the current value of el1, so it assigns the bit string representation of 27 to the float variable x, which, of course, is nonsense.
联合的类型检查要求每个联合构造都包含一个类型指示符。这样的指示符称为标签或判别式,具有判别式的联合称为判别联合。提供判别联合的第一个语言是 ALGOL 68。它们现在受 ML、Haskell 和 F# 支持。
Type checking of unions requires that each union construct include a type indicator. Such an indicator is called a tag, or discriminant, and a union with a discriminant is called a discriminated union. The first language to provide discriminated unions was ALGOL 68. They are now supported by ML, Haskell, and F#.
在 F# 中,使用类型语句声明联合,并使用 OR 运算符 ( |) 来定义组件。例如,我们可以有以下内容:
A union is declared in F# with a type statement using OR operators (|) to define the components. For example, we could have the following:
type intReal =
| IntValue of int
| RealValue of float;;type intReal =
| IntValue of int
| RealValue of float;;
在此示例中,intReal是联合类型。IntValue和RealValue是构造函数。可以使用构造函数创建类型的值intReal,就像它们是函数一样,如以下示例所示:7
In this example, intReal is the union type. IntValue and RealValue are constructors. Values of type intReal can be created using the constructors as if they were a function, as in the following examples:7
let ir1 = IntValue 17;;
let ir2 = RealValue 3.4;;
let ir1 = IntValue 17;;
let ir2 = RealValue 3.4;;
访问联合的值是通过模式匹配结构完成的。F# 中的模式匹配使用match保留字指定。该构造的一般形式如下:
Accessing the value of a union is done with a pattern-matching structure. Pattern matching in F# is specified with the match reserved word. The general form of the construct is as follows:
match图案with
match pattern with
|
|
| . . .
| . . .
|
|
模式可以是任何数据类型。表达式列表可以包含通配符 ( _),也可以仅包含一个通配符。例如,考虑以下匹配构造:
The pattern can be any data type. The expression list can include wild card characters (_) or be solely a wild card character. For example, consider the following match construct:
let a = 7;;
let b = "grape";;
let x = match (a, b) with
| 4, "apple" -> apple
| _, "grape" -> grape
| _ -> fruit;;let a = 7;;
let b = "grape";;
let x = match (a, b) with
| 4, "apple" -> apple
| _, "grape" -> grape
| _ -> fruit;;
要显示联合的类型intReal,可以使用以下函数:
To display the type of the intReal union, the following function could be used:
let printType value =
match value with
| IntValue value -> printfn "It is an integer"
| RealValue value -> printfn "It is a float";;let printType value =
match value with
| IntValue value -> printfn "It is an integer"
| RealValue value -> printfn "It is a float";;
以下几行显示对此函数的调用和输出:
The following lines show calls to this function and the output:
printType ir1;;
It is an integer
printType ir2;;
It is a float
printType ir1;;
It is an integer
printType ir2;;
It is a float在某些语言中,联合是潜在的不安全构造。这是 C 和 C++ 不是强类型的原因之一:这些语言不允许对其联合的引用进行类型检查。另一方面,联合可以安全地使用,就像 ML、Haskell 和 F# 中的设计一样。
Unions are potentially unsafe constructs in some languages. They are one of the reasons why C and C++ are not strongly typed: These languages do not allow type checking of references to their unions. On the other hand, unions can be safely used, as in their design in ML, Haskell, and F#.
Java 和 C# 都不包含联合,这可能反映了人们对某些编程语言的安全性日益关注。
Neither Java nor C# includes unions, which may be reflective of the growing concern for safety in some programming languages.
联合的实现方式是简单地对每个可能的变体使用相同的地址。为最大的变体分配足够的存储空间。
Unions are implemented by simply using the same address for every possible variant. Sufficient storage for the largest variant is allocated.
指针类型是指变量具有由内存地址和特殊值nil组成的值范围的类型。值 nil 不是有效地址,用于指示指针当前不能用于引用内存单元。
A pointer type is one in which the variables have a range of values that consists of memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference a memory cell.
指针有两种不同的用途。首先,指针提供了间接寻址的部分功能,这在汇编语言编程中经常使用。其次,指针提供了一种管理动态存储的方法。指针可用于访问动态分配存储的区域(称为堆)中的位置。
Pointers are designed for two distinct kinds of uses. First, pointers provide some of the power of indirect addressing, which is frequently used in assembly language programming. Second, pointers provide a way to manage dynamic storage. A pointer can be used to access a location in an area where storage is dynamically allocated called a heap.
从堆中动态分配的变量称为堆动态变量。它们通常没有与之关联的标识符,因此只能由指针或引用类型变量引用。没有名称的变量称为匿名变量。正是在指针的后一种应用领域中出现了最重要的设计问题。
Variables that are dynamically allocated from the heap are called heap-dynamic variables. They often do not have identifiers associated with them and thus can be referenced only by pointer or reference type variables. Variables without names are called anonymous variables. It is in this latter application area of pointers that the most important design issues arise.
指针与数组和记录不同,不是结构化类型,尽管它们是使用类型运算符定义的(*在 C 和 C++ 中)。此外,它们也不同于标量变量,因为它们用于引用其他变量,而不是用于存储数据。这两类变量分别称为引用类型和值类型。
Pointers, unlike arrays and records, are not structured types, although they are defined using a type operator (* in C and C++). Furthermore, they are also different from scalar variables because they are used to reference some other variable, rather than being used to store data. These two categories of variables are called reference types and value types, respectively.
这两种指针的使用方式都增加了语言的可写性。例如,假设需要在一种没有指针或动态存储的语言中实现二叉树之类的动态结构。这将要求程序员提供并维护一个可用树节点池,该池可能以并行数组的形式实现。此外,程序员还需要猜测所需节点的最大数量。这显然是一种处理二叉树的笨拙且容易出错的方法。
Both kinds of uses of pointers add writability to a language. For example, suppose it is necessary to implement a dynamic structure like a binary tree in a language that does not have pointers or dynamic storage. This would require the programmer to provide and maintain a pool of available tree nodes, which would probably be implemented in parallel arrays. Also, it would be necessary for the programmer to guess the maximum number of required nodes. This is clearly an awkward and error-prone way to deal with binary trees.
引用变量与指针密切相关,我们将在6.11.6节 中讨论这些变量。
Reference variables, which are discussed in Section 6.11.6, are closely related to pointers.
指针的特定主要设计问题如下:
The primary design issues particular to pointers are the following:
指针变量的范围和寿命是什么?
What are the scope and lifetime of a pointer variable?
堆动态变量(指针引用的值)的生命周期是多长?
What is the lifetime of a heap-dynamic variable (the value a pointer references)?
指针指向的值的类型是否有限制?
Are pointers restricted as to the type of value to which they can point?
指针是否用于动态存储管理、间接寻址,还是两者兼而有之?
Are pointers used for dynamic storage management, indirect addressing, or both?
该语言是否应该支持指针类型、引用类型,还是两者兼而有之?
Should the language support pointer types, reference types, or both?
提供指针类型的语言通常包括两个基本指针操作:赋值和取消引用。第一个操作将指针变量的值设置为某个有用的地址。如果指针变量仅用于管理动态存储,则分配机制(无论是通过运算符还是内置子程序)都用于初始化指针变量。如果指针用于非堆动态变量的间接寻址,则必须有一个显式运算符或内置子程序来获取变量的地址,然后可以将其分配给指针变量。
Languages that provide a pointer type usually include two fundamental pointer operations: assignment and dereferencing. The first operation sets a pointer variable’s value to some useful address. If pointer variables are used only to manage dynamic storage, then the allocation mechanism, whether by operator or built-in subprogram, serves to initialize the pointer variable. If pointers are used for indirect addressing to variables that are not heap dynamic, then there must be an explicit operator or built-in subprogram for fetching the address of a variable, which can then be assigned to the pointer variable.
表达式中出现的指针变量可以用两种不同的方式来解释。首先,它可以被解释为对其所绑定的存储单元内容的引用,对于指针来说,就是地址。这正是表达式中非指针变量的解释方式,尽管在那种情况下,它的值可能不是地址。但是,指针也可以被解释为对指针变量所绑定的存储单元所指向的存储单元中的值的引用。在这种情况下,指针被解释为间接引用。前一种情况是普通的指针引用;后一种情况是取消引用指针的结果。取消引用是将引用通过一个间接级别,是第二种基本指针操作。
An occurrence of a pointer variable in an expression can be interpreted in two distinct ways. First, it could be interpreted as a reference to the contents of the memory cell to which it is bound, which in the case of a pointer is an address. This is exactly how a nonpointer variable in an expression would be interpreted, although in that case its value likely would not be an address. However, the pointer also could be interpreted as a reference to the value in the memory cell pointed to by the memory cell to which the pointer variable is bound. In this case, the pointer is interpreted as an indirect reference. The former case is a normal pointer reference; the latter is the result of dereferencing the pointer. Dereferencing, which takes a reference through one level of indirection, is the second fundamental pointer operation.
指针的解除引用可以是显式的,也可以是隐式的。在许多现代语言中,只有在明确指定时才会发生这种情况。在 C++ 中,它是用星号 ( *) 作为前缀一元运算符明确指定的。考虑以下解除引用的示例:如果ptr是值为 7080 的指针变量,并且地址为 7080 的单元的值为 206,则赋值
Dereferencing of pointers can be either explicit or implicit. In many contemporary languages, it occurs only when explicitly specified. In C++, it is explicitly specified with the asterisk (*) as a prefix unary operator. Consider the following example of dereferencing: If ptr is a pointer variable with the value 7080 and the cell whose address is 7080 has the value 206, then the assignment
j = *ptrj = *ptr
sets j to 206. This process is shown in Figure 6.8.
j = *ptrj = *ptr当指针指向记录时,对这些记录字段的引用语法因语言而异。在 C 和 C++ 中,有两种方法可以使用指向记录的指针来引用该记录中的字段。如果指针变量p指向带有名为 的字段的记录age,(*p).age则可用于引用该字段。运算符->,当在指向结构的指针和该结构的字段之间使用时,结合了取消引用和字段引用。例如,表达式p -> age相当于(*p).age。
When pointers point to records, the syntax of the references to the fields of these records varies among languages. In C and C++, there are two ways a pointer to a record can be used to reference a field in that record. If a pointer variable p points to a record with a field named age, (*p).age can be used to refer to that field. The operator ->, when used between a pointer to a struct and a field of that struct, combines dereferencing and field reference. For example, the expression p -> age is equivalent to (*p).age.
提供用于管理堆的指针的语言必须包含显式分配操作。分配有时用子程序指定,例如malloc在 C 中。在支持面向对象编程的语言中,堆对象的分配通常用运算符指定new。不提供隐式释放的 C++ 使用delete作为其释放运算符。
Languages that provide pointers for the management of a heap must include an explicit allocation operation. Allocation is sometimes specified with a subprogram, such as malloc in C. In languages that support object-oriented programming, allocation of heap objects is often specified with the new operator. C++, which does not provide implicit deallocation, uses delete as its deallocation operator.
第一种包含指针变量的高级编程语言是 PL/I,其中指针可用于引用堆动态变量和其他程序变量。PL/I 的指针非常灵活,但使用它们可能会导致多种编程错误。PL/I 指针的一些问题也存在于后续语言的指针中。一些最近的语言,如 Java,已经用引用类型完全取代了指针,这与隐式释放一起,最大限度地减少了指针的主要问题。引用类型实际上只是一个具有受限操作的指针。
The first high-level programming language to include pointer variables was PL/I, in which pointers could be used to refer to both heap-dynamic variables and other program variables. The pointers of PL/I were highly flexible, but their use could lead to several kinds of programming errors. Some of the problems of PL/I pointers are also present in the pointers of subsequent languages. Some recent languages, such as Java, have replaced pointers completely with reference types, which, along with implicit deallocation, minimize the primary problems with pointers. A reference type is really only a pointer with restricted operations.
悬垂指针或悬垂引用是包含已被释放的堆动态变量地址的指针。悬垂指针很危险,原因如下。首先,指向的位置可能已重新分配给某个新的堆动态变量。如果新变量与旧变量不是同一类型,则对悬垂指针使用的类型检查无效。即使新的动态变量是同一类型,其新值也与旧指针的取消引用值没有任何关系。此外,如果使用悬垂指针更改堆动态变量,则新堆动态变量的值将被破坏。最后,有可能该位置现在正被存储管理系统临时使用,可能作为可用存储块链中的指针,从而允许更改位置导致存储管理器失败。
A dangling pointer, or dangling reference, is a pointer that contains the address of a heap-dynamic variable that has been deallocated. Dangling pointers are dangerous for several reasons. First, the location being pointed to may have been reallocated to some new heap-dynamic variable. If the new variable is not the same type as the old one, type checks of uses of the dangling pointer are invalid. Even if the new dynamic variable is the same type, its new value will have no relationship to the old pointer’s dereferenced value. Furthermore, if the dangling pointer is used to change the heap-dynamic variable, the value of the new heap-dynamic variable will be destroyed. Finally, it is possible that the location now is being temporarily used by the storage management system, possibly as a pointer in a chain of available blocks of storage, thereby allowing a change to the location to cause the storage manager to fail.
以下操作序列在许多语言中都会创建悬空指针:
The following sequence of operations creates a dangling pointer in many languages:
创建一个新的堆动态变量并p1设置指针指向它。
A new heap-dynamic variable is created and pointer p1 is set to point to it.
指针p2被分配了p1值。
Pointer p2 is assigned p1’s value.
指向的堆动态变量p1被明确释放(可能设置p1为nil),但p2未被操作更改。p2现在是一个悬空指针。如果释放操作没有改变p1,p1和都p2将是悬空的。(当然,这是一个别名问题——p1和p2是别名。)
The heap-dynamic variable pointed to by p1 is explicitly deallocated (possibly setting p1 to nil), but p2 is not changed by the operation. p2 is now a dangling pointer. If the deallocation operation did not change p1, both p1 and p2 would be dangling. (Of course, this is a problem of aliasing—p1 and p2 are aliases.)
例如,在 C++ 中我们可以有以下内容:
For example, in C++ we could have the following:
int * arrayPtr1;
int * arrayPtr2 = new int[100];
arrayPtr1 = arrayPtr2;
delete [] arrayPtr2;
// Now, arrayPtr1 is dangling, because the heap storage
// to which it was pointing has been deallocated.
int * arrayPtr1;
int * arrayPtr2 = new int[100];
arrayPtr1 = arrayPtr2;
delete [] arrayPtr2;
// Now, arrayPtr1 is dangling, because the heap storage
// to which it was pointing has been deallocated.
在 C++ 中,arrayPtr1和arrayPtr2现在都是悬垂指针,因为 C++delete运算符对其操作数指针的值没有影响。在 C++ 中,遵循delete零赋值给已释放其指向值的指针(代表空值)是常见(且安全)的做法。
In C++, both arrayPtr1 and arrayPtr2 are now dangling pointers, because the C++ delete operator has no effect on the value of its operand pointer. In C++, it is common (and safe) to follow a delete operator with an assignment of zero, which represents null, to the pointer whose pointed-to value has been deallocated.
请注意,动态变量的显式释放是导致悬垂指针的原因。
Notice that the explicit deallocation of dynamic variables is the cause of dangling pointers.
Pascal 包含一个显式释放运算符:dispose。由于 dispose 导致悬空指针问题,一些 Pascal 实现在程序中出现 dispose 时会直接忽略它。虽然这可以有效地防止悬空指针,但也不允许重用程序不再需要的堆存储。回想一下,Pascal 最初是作为教学语言设计的,而不是工业工具。
Pascal included an explicit deallocate operator: dispose. Because of the problem of dangling pointers caused by dispose, some Pascal implementations simply ignored dispose when it appeared in a program. Although this effectively prevents dangling pointers, it also disallows the reuse of heap storage that the program no longer needs. Recall that Pascal initially was designed as a teaching language, rather than as an industrial tool.
丢失的堆动态变量是已分配的堆动态变量,用户程序不再可访问。此类变量通常称为垃圾,因为它们对于其原始用途毫无用处,也无法重新分配用于程序中的其他用途。丢失的堆动态变量通常是通过以下操作序列创建的:
A lost heap-dynamic variable is an allocated heap-dynamic variable that is no longer accessible to the user program. Such variables are often called garbage, because they are not useful for their original purpose, and they also cannot be reallocated for some new use in the program. Lost heap-dynamic variables are most often created by the following sequence of operations:
指针p1被设置为指向新创建的堆动态变量。
Pointer p1 is set to point to a newly created heap-dynamic variable.
p1后来被设置为指向另一个新创建的堆动态变量。
p1 is later set to point to another newly created heap-dynamic variable.
第一个堆动态变量现在无法访问或丢失。这有时称为内存泄漏。内存泄漏是不管语言使用的是隐式还是显式释放,这都是一个问题。在以下章节中,我们将研究语言设计者如何处理悬垂指针和堆动态变量丢失的问题。
The first heap-dynamic variable is now inaccessible, or lost. This is sometimes called memory leakage. Memory leakage is a problem, regardless of whether the language uses implicit or explicit deallocation. In the following sections, we investigate how language designers have dealt with the problems of dangling pointers and lost heap-dynamic variables.
在 C 和 C++ 中,指针的使用方式与汇编语言中的地址相同。这意味着它们非常灵活,但必须非常小心地使用。这种设计无法解决悬空指针或丢失堆动态变量的问题。然而,在 C 和 C++ 中可以进行指针运算这一事实使得它们的指针比其他编程语言的指针更有趣。
In C and C++, pointers can be used in the same ways as addresses are used in assembly languages. This means they are extremely flexible but must be used with great care. This design offers no solutions to the dangling pointer or lost heap-dynamic variable problems. However, the fact that pointer arithmetic is possible in C and C++ makes their pointers more interesting than those of the other programming languages.
C 和 C++ 指针可以指向任何变量,无论该变量分配在何处。事实上,它们可以指向内存中的任何位置,无论那里是否有变量,这是此类指针的危险之一。
C and C++ pointers can point at any variable, regardless of where it is allocated. In fact, they can point anywhere in memory, whether there is a variable there or not, which is one of the dangers of such pointers.
在 C 和 C++ 中,星号 ( *) 表示取消引用操作,而与号 ( &) 表示用于生成变量地址的运算符。例如,考虑以下代码:
In C and C++, the asterisk (*) denotes the dereferencing operation and the ampersand (&) denotes the operator for producing the address of a variable. For example, consider the following code:
int *ptr;
int count, init;
. . .
ptr = &init;
count = *ptr;
int *ptr;
int count, init;
. . .
ptr = &init;
count = *ptr;
对变量的赋值ptr将其设置为 的地址init。对 的赋值count取消引用ptr以产生 处的值init,然后将其赋值给count。因此,两个赋值语句的效果是将 的值赋值init给count。请注意,指针的声明指定了其域类型。
The assignment to the variable ptr sets it to the address of init. The assignment to count dereferences ptr to produce the value at init, which is then assigned to count. So, the effect of the two assignment statements is to assign the value of init to count. Notice that the declaration of a pointer specifies its domain type.
可以为指针分配正确域类型的任何变量的地址值,也可以为指针分配常数零,用于nil。
Pointers can be assigned the address value of any variable of the correct domain type, or they can be assigned the constant zero, which is used for nil.
指针运算也可以以某些受限形式进行。例如,如果ptr是一个指针变量,声明为指向某个数据类型的某个变量,则
Pointer arithmetic is also possible in some restricted forms. For example, if ptr is a pointer variable that is declared to point at some variable of some data type, then
ptr + indexptr + index
是合法表达式。这种表达式的语义如下。不是简单地将 的值添加到index,而是首先将ptr的值index按 指向的存储单元(以内存单元为单位)的大小缩放ptr(其基类型)。例如,如果ptr指向一个大小为四个内存单元的类型的存储单元,则将index乘以 4,并将结果添加到ptr。这种地址算法的主要目的是数组操作。以下讨论仅与一维数组有关。
is a legal expression. The semantics of such an expression is as follows. Instead of simply adding the value of index to ptr, the value of index is first scaled by the size of the memory cell (in memory units) to which ptr is pointing (its base type). For example, if ptr points to a memory cell for a type that is four memory units in size, then index is multiplied by 4, and the result is added to ptr. The primary purpose of this sort of address arithmetic is array manipulation. The following discussion is related to single-dimensioned arrays only.
在 C 和 C++ 中,所有数组都使用零作为其下标范围的下限,并且没有下标的数组名称始终引用第一个元素的地址。考虑以下声明:
In C and C++, all arrays use zero as the lower bound of their subscript ranges, and array names without subscripts always refer to the address of the first element. Consider the following declarations:
int list [10];
int *ptr;
int list [10];
int *ptr;
现在考虑任务
Now consider the assignment
ptr = list;ptr = list;
list[0]将的地址赋给ptr. 经过此赋值,下列说法成立:
This assigns the address of list[0] to ptr. Given this assignment, the following are true:
*(ptr + 1)相当于list[1]。
*(ptr + 1) is equivalent to list[1].
*(ptr + index)相当于list[index]。
*(ptr + index) is equivalent to list[index].
ptr[index]相当于list[index]。
ptr[index] is equivalent to list[index].
从这些语句中可以清楚地看出,指针操作包括与索引操作相同的缩放。此外,指向数组的指针可以像数组名称一样进行索引。
It is clear from these statements that the pointer operations include the same scaling that is used in indexing operations. Furthermore, pointers to arrays can be indexed as if they were array names.
C 和 C++ 中的指针可以指向函数。此功能用于将函数作为参数传递给其他函数。指针还用于参数传递,如第9章 所述。
Pointers in C and C++ can point to functions. This feature is used to pass functions as parameters to other functions. Pointers are also used for parameter passing, as discussed in Chapter 9.
C 和 C++ 包含类型为 的指针void *,它可以指向任何类型的值。实际上它们是通用指针。但是,类型检查对于void *指针来说不是问题,因为这些语言不允许取消引用它们。void *指针的一个常见用途是作为操作内存的函数的参数类型。例如,假设我们想要一个函数将一串字节数据从内存中的一个位置移动到另一个位置。如果它可以传递两个任何类型的指针,那将是最通用的。如果函数中相应的形式参数是void *类型,那么这将是合法的。然后,函数可以将它们转换为char *类型并执行操作,而不管作为实际参数发送的指针是什么类型。
C and C++ include pointers of type void *, which can point at values of any type. In effect they are generic pointers. However, type checking is not a problem with void * pointers, because these languages disallow dereferencing them. One common use of void * pointers is as the types of parameters of functions that operate on memory. For example, suppose we wanted a function to move a sequence of bytes of data from one place in memory to another. It would be most general if it could be passed two pointers of any type. This would be legal if the corresponding formal parameters in the function were void * type. The function could then convert them to char * type and do the operation, regardless of what type pointers were sent as actual parameters.
引用类型变量与指针类似,但有一个重要且根本的区别:指针指向内存中的地址,而引用指向内存中的对象或值。因此,虽然对地址进行算术运算很自然,但对引用进行算术运算却不合理。
A reference type variable is similar to a pointer, with one important and fundamental difference: A pointer refers to an address in memory, while a reference refers to an object or a value in memory. As a result, although it is natural to perform arithmetic on addresses, it is not sensible to do arithmetic on references.
C++ 包含一种特殊的引用类型,主要用于函数定义中的形式参数。C++ 引用类型变量是一个常量指针,始终隐式取消引用。由于 C++ 引用类型变量是常量,因此必须使用其定义中某个变量的地址对其进行初始化,初始化后,引用类型变量可以绝不能设置为引用任何其他变量。隐式取消引用可防止分配给引用变量的地址值。
C++ includes a special kind of reference type that is used primarily for the formal parameters in function definitions. A C++ reference type variable is a constant pointer that is always implicitly dereferenced. Because a C++ reference type variable is a constant, it must be initialized with the address of some variable in its definition, and after initialization a reference type variable can never be set to reference any other variable. The implicit dereference prevents assignment to the address value of a reference variable.
引用类型变量在定义中通过在其名称前加上与符号 ( ) 来指定&。例如,
Reference type variables are specified in definitions by preceding their names with ampersands (&). For example,
int result = 0;
int &ref_result = result;
. . .
ref_result = 100;int result = 0;
int &ref_result = result;
. . .
ref_result = 100;
在此代码段中,result和ref_result是别名。
In this code segment, result and ref_result are aliases.
当在函数定义中用作形式参数时,C++ 引用类型可在调用函数和被调用函数之间提供双向通信。对于非指针原始参数类型,这是不可能的,因为 C++ 参数是通过值传递的。将指针作为参数传递可实现相同的双向通信,但指针形式参数需要显式取消引用,从而使代码的可读性和安全性降低。引用参数在被调用函数中的引用与其他参数完全相同。调用函数无需指定其对应形式参数为引用类型的参数是否异常。编译器将地址而不是值传递给引用参数。
When used as formal parameters in function definitions, C++ reference types provide for two-way communication between the caller function and the called function. This is not possible with nonpointer primitive parameter types, because C++ parameters are passed by value. Passing a pointer as a parameter accomplishes the same two-way communication, but pointer formal parameters require explicit dereferencing, making the code less readable and less safe. Reference parameters are referenced in the called function exactly as are other parameters. The calling function need not specify that a parameter whose corresponding formal parameter is a reference type is anything unusual. The compiler passes addresses, rather than values, to reference parameters.
为了提高安全性,Java 的设计者们完全删除了 C++ 风格的指针。与 C++ 引用变量不同,Java 引用变量可以分配为引用不同的类实例;它们不是常量。所有 Java 类实例都由引用变量引用。事实上,这是 Java 中引用变量的唯一用途。这些问题将在第12章 中进一步讨论。
In their quest for increased safety over C++, the designers of Java removed C++-style pointers altogether. Unlike C++ reference variables, Java reference variables can be assigned to refer to different class instances; they are not constants. All Java class instances are referenced by reference variables. That is, in fact, the only use of reference variables in Java. These issues are discussed further in Chapter 12.
下面String是一个标准的 Java 类:
In the following, String is a standard Java class:
String str1;
. . .
str1 = "This is a Java literal string";
String str1;
. . .
str1 = "This is a Java literal string";
在此代码中,str1定义为对String类实例或对象的引用。它最初设置为 null。后续赋值设置str1为对String对象的引用,"This is a Java literal string"。
In this code, str1 is defined to be a reference to a String class instance or object. It is initially set to null. The subsequent assignment sets str1 to reference the String object, "This is a Java literal string".
因为 Java 类实例是隐式释放的(没有显式的释放操作符),所以 Java 中不可能存在悬垂引用。
Because Java class instances are implicitly deallocated (there is no explicit deallocation operator), there cannot be dangling references in Java.
C# 既包含 Java 的引用,也包含 C++ 的指针。但是,强烈不建议使用指针。事实上,任何使用指针的子程序都必须包含unsafe。请注意,尽管引用指向的对象会被隐式释放,但对于指针指向的对象则并非如此。C# 中包含指针主要是为了允许 C# 程序与 C 和 C++ 代码进行互操作。
C# includes both the references of Java and the pointers of C++. However, the use of pointers is strongly discouraged. In fact, any subprogram that uses pointers must include the unsafe modifier. Note that although objects pointed to by references are implicitly deallocated, that is not true for objects pointed to by pointers. Pointers were included in C# primarily to allow C# programs to interoperate with C and C++ code.
面向对象语言 Smalltalk、Python 和 Ruby 中的所有变量都是引用。它们始终被隐式解除引用。此外,这些变量的直接值无法访问。
All variables in the object-oriented languages Smalltalk, Python, and Ruby are references. They are always implicitly dereferenced. Furthermore, the direct values of these variables cannot be accessed.
悬垂指针和垃圾问题已详细讨论过。堆管理问题在6.11.7.3节 中讨论。
The problems of dangling pointers and garbage have already been discussed at length. The problems of heap management are discussed in Section 6.11.7.3.
指针曾与 goto 进行比较。goto 语句扩大了接下来可以执行的语句的范围。指针变量扩大了变量可以引用的内存单元的范围。关于指针最严厉的言论可能是Hoare (1973)所说的:“将它们引入高级语言是一种倒退,我们可能永远无法恢复过来。”
Pointers have been compared with the goto. The goto statement widens the range of statements that can be executed next. Pointer variables widen the range of memory cells that can be referenced by a variable. Perhaps the most damning statement about pointers was made by Hoare (1973): “Their introduction into high-level languages has been a step backward from which we may never recover.”
另一方面,指针在某些编程应用中必不可少。例如,编写设备驱动程序时需要用到指针,因为必须访问特定的绝对地址。
On the other hand, pointers are essential in some kinds of programming applications. For example, pointers are necessary to write device drivers, in which specific absolute addresses must be accessed.
Java 和 C# 的引用提供了指针的一些灵活性和功能,而没有危险。程序员是否愿意放弃 C 和 C++ 指针的全部功能来换取引用的更高安全性还有待观察。C# 程序使用指针的程度将是衡量这一点的一个标准。
The references of Java and C# provide some of the flexibility and the capabilities of pointers, without the hazards. It remains to be seen whether programmers will be willing to trade the full power of C and C++ pointers for the greater safety of references. The extent to which C# programs use pointers will be one measure of this.
在大多数语言中,指针用于堆管理。Java 和 C# 引用以及 Smalltalk 和 Ruby 中的变量也是如此,因此我们不能单独处理指针和引用。首先,我们简要描述指针和引用在内部的表示方式。然后我们讨论悬垂指针问题的两种可能解决方案。最后,我们描述堆管理技术的主要问题。
In most languages, pointers are used in heap management. The same is true for Java and C# references, as well as the variables in Smalltalk and Ruby, so we cannot treat pointers and references separately. First, we briefly describe how pointers and references are represented internally. We then discuss two possible solutions to the dangling pointer problem. Finally, we describe the major problems with heap-management techniques.
在大多数大型计算机中,指针和引用是存储在内存单元中的单个值。然而,在基于英特尔微处理器的早期微型计算机中,地址有两个部分:段和偏移量。因此,指针和引用在这些系统中实现为 16 位单元对,每个单元对应一个地址的两个部分。
In most larger computers, pointers and references are single values stored in memory cells. However, in early microcomputers based on Intel microprocessors, addresses have two parts: a segment and an offset. So, pointers and references are implemented in these systems as pairs of 16-bit cells, one for each of the two parts of an address.
针对悬垂指针问题,已经提出了几种解决方案。其中包括墓碑( Lomet, 1975 ),其中每个堆动态变量都包含一个称为墓碑的特殊单元,该单元本身是指向堆动态变量的指针。实际的指针变量仅指向墓碑,而不指向堆动态变量。当堆动态变量被释放时,墓碑仍然存在,但被设置为 nil,表示堆动态变量不再存在。这种方法可以防止指针指向已释放的变量。对指向墓碑的任何指针的任何引用都nil可能被检测为错误。
There have been several proposed solutions to the dangling-pointer problem. Among these are tombstones (Lomet, 1975), in which every heap-dynamic variable includes a special cell, called a tombstone, that is itself a pointer to the heap-dynamic variable. The actual pointer variable points only at tombstones and never to heap-dynamic variables. When a heap-dynamic variable is deallocated, the tombstone remains but is set to nil, indicating that the heap-dynamic variable no longer exists. This approach prevents a pointer from ever pointing to a deallocated variable. Any reference to any pointer that points to a nil tombstone can be detected as an error.
墓碑在时间和空间上都很昂贵。由于墓碑永远不会被释放,因此其存储空间永远不会被回收。每次通过墓碑访问堆动态变量都需要再增加一级间接寻址,这在大多数计算机上都需要额外的机器周期。显然,所有流行语言的设计者都认为额外的安全性不值得付出额外的成本,因为没有一种广泛使用的语言使用墓碑。
Tombstones are costly in both time and space. Because tombstones are never deallocated, their storage is never reclaimed. Every access to a heap-dynamic variable through a tombstone requires one more level of indirection, which requires an additional machine cycle on most computers. Apparently none of the designers of the more popular languages have found the additional safety to be worth this additional cost, because no widely used language uses tombstones.
墓碑的替代方法是UW-Pascal 实现中使用的锁和钥匙方法( Fischer 和 LeBlanc,1977 年、1980 年)。在此编译器中,指针值表示为有序对(键、地址),其中键是整数值。堆动态变量表示为变量的存储空间加上存储整数锁值的头单元。分配堆动态变量时,将创建一个锁值并将其放置在堆动态变量的锁单元和调用中指定的指针的键单元中new。每次访问取消引用的指针时,都会将指针的键值与堆动态变量中的锁值进行比较。如果它们匹配,则访问合法;否则访问将被视为运行时错误。任何指向其他指针的指针值副本都必须复制键值。因此,任意数量的指针都可以引用给定的堆动态变量。当使用 释放堆动态变量时dispose,其锁定值将被清除为非法锁定值。然后,如果dispose取消引用 中指定的指针以外的指针,其地址值将仍然完好无损,但其键值将不再与锁匹配,因此将不允许访问。
An alternative to tombstones is the locks-and-keys approach used in the implementation of UW-Pascal (Fischer and LeBlanc, 1977, 1980). In this compiler, pointer values are represented as ordered pairs (key, address), where the key is an integer value. Heap-dynamic variables are represented as the storage for the variable plus a header cell that stores an integer lock value. When a heap-dynamic variable is allocated, a lock value is created and placed both in the lock cell of the heap-dynamic variable and in the key cell of the pointer that is specified in the call to new. Every access to the dereferenced pointer compares the key value of the pointer to the lock value in the heap-dynamic variable. If they match, the access is legal; otherwise the access is treated as a run-time error. Any copies of the pointer value to other pointers must copy the key value. Therefore, any number of pointers can reference a given heap-dynamic variable. When a heap-dynamic variable is deallocated with dispose, its lock value is cleared to an illegal lock value. Then, if a pointer other than the one specified in the dispose is dereferenced, its address value will still be intact, but its key value will no longer match the lock, so the access will not be allowed.
当然,解决悬垂指针问题的最佳方法是将堆动态变量的释放从程序员手中夺走。如果程序不能显式释放堆动态变量,就不会有悬垂指针。为此,运行时系统必须在堆动态变量不再有用时隐式释放它们。Lisp 系统一直都是这样做的。Java 和 C# 也都对其引用变量使用这种方法。回想一下,C# 的指针不包括隐式释放。
Of course, the best solution to the dangling-pointer problem is to take deallocation of heap-dynamic variables out of the hands of programmers. If programs cannot explicitly deallocate heap-dynamic variables, there will be no dangling pointers. To do this, the run-time system must implicitly deallocate heap-dynamic variables when they are no longer useful. Lisp systems have always done this. Both Java and C# also use this approach for their reference variables. Recall that C#’s pointers do not include implicit deallocation.
堆管理可能是一个非常复杂的运行时过程。我们将在两种不同的情况下研究该过程:一种情况下,所有堆存储都以单一大小的单元进行分配和释放;另一种情况下,分配和释放可变大小的段。请注意,对于释放,我们仅讨论隐式方法。我们的讨论将简短且远非全面,因为对这些过程及其相关问题的彻底分析与其说是语言设计问题,不如说是实现问题。
Heap management can be a very complex run-time process. We examine the process in two separate situations: one in which all heap storage is allocated and deallocated in units of a single size, and one in which variable-size segments are allocated and deallocated. Note that for deallocation, we discuss only implicit approaches. Our discussion will be brief and far from comprehensive, since a thorough analysis of these processes and their associated problems is not so much a language design issue as it is an implementation issue.
最简单的情况是所有分配和释放都是单一大小的单元。当每个单元都已包含一个指针时,情况会进一步简化。这是许多 Lisp 实现的情况,其中动态存储分配问题首先大规模遇到。所有 Lisp 程序和大多数 Lisp 数据都由链接列表中的单元组成。
The simplest situation is when all allocation and deallocation is of single-size cells. It is further simplified when every cell already contains a pointer. This is the scenario of many implementations of Lisp, where the problems of dynamic storage allocation were first encountered on a large scale. All Lisp programs and most Lisp data consist of cells in linked lists.
在单一大小的分配堆中,所有可用单元都使用单元中的指针链接在一起,形成可用空间列表。分配很简单,只需在需要时从此列表中获取所需数量的单元即可。释放是一个复杂得多的过程。堆动态变量可以由多个指针指向,因此很难确定该变量何时对程序不再有用。仅仅因为一个指针与单元断开连接显然不会使其成为垃圾;可能还有其他几个指针仍指向该单元。
In a single-size allocation heap, all available cells are linked together using the pointers in the cells, forming a list of available space. Allocation is a simple matter of taking the required number of cells from this list when they are needed. Deallocation is a much more complex process. A heap-dynamic variable can be pointed to by more than one pointer, making it difficult to determine when the variable is no longer useful to the program. Simply because one pointer is disconnected from a cell obviously does not make it garbage; there could be several other pointers still pointing to the cell.
在 Lisp 中,程序中几个最频繁的操作会创建程序不再可访问的单元集合,因此应将其释放(放回可用空间列表)。Lisp 的基本设计目标之一是确保回收未使用的单元不是程序员的任务,而是运行时系统的任务。这一目标给 Lisp 实现者留下了一个基本设计问题:何时应执行释放?
In Lisp, several of the most frequent operations in programs create collections of cells that are no longer accessible to the program and therefore should be deallocated (put back on the list of available space). One of the fundamental design goals of Lisp was to ensure that reclamation of unused cells would not be the task of the programmer but rather that of the run-time system. This goal left Lisp implementors with the fundamental design question: When should deallocation be performed?
垃圾收集有几种不同的方法。两种最常见的传统技术在某些方面是相反的过程。它们被称为引用计数器,其中回收是增量的,并且在创建不可访问的单元时完成,以及标记-清除,其中回收仅在可用空间列表为空时发生。这两种方法有时分别称为急切方法和惰性方法。已经开发了这两种方法的许多变体。然而,在本节中,我们仅讨论基本过程。
There are several different approaches to garbage collection. The two most common traditional techniques are in some ways opposite processes. These are named reference counters, in which reclamation is incremental and is done when inaccessible cells are created, and mark-sweep, in which reclamation occurs only when the list of available space becomes empty. These two methods are sometimes called the eager approach and the lazy approach, respectively. Many variations of these two approaches have been developed. In this section, however, we discuss only the basic processes.
存储回收的引用计数器方法通过在每个单元中维护一个计数器来实现其目标,该计数器存储当前指向该单元的指针数量。当指针与单元断开连接时,引用计数器的递减操作中嵌入了对零值的检查。如果引用计数器达到零,则意味着没有程序指针指向该单元,因此该单元已成为垃圾,可以返回到可用空间列表。
The reference counter method of storage reclamation accomplishes its goal by maintaining in every cell a counter that stores the number of pointers that are currently pointing at the cell. Embedded in the decrement operation for the reference counters, which occurs when a pointer is disconnected from the cell, is a check for a zero value. If the reference counter reaches zero, it means that no program pointers are pointing at the cell, and it has thus become garbage and can be returned to the list of available space.
引用计数器方法存在三个明显问题。首先,如果存储单元相对较小,则计数器所需的空间就很大。其次,显然需要一些执行时间来维护计数器值。每次更改指针值时,它指向的单元的计数器必须减少,而它现在指向的单元的计数器必须增加。在 Lisp 这样的语言中,几乎每个操作都涉及更改指针,这可能占程序总执行时间的很大一部分。当然,如果指针更改不太频繁,这不是问题。可以通过一种称为延迟引用计数的方法消除引用计数器的一些低效率,该方法避免了某些指针的引用计数器。第三个问题是,当单元集合循环连接时会出现复杂情况。这里的问题是循环列表中的每个单元的引用计数器值至少为 1,这会阻止它被收集并放回可用空间列表中。这个问题的解决方案可以在Friedman 和 Wise (1979)中找到。
There are three distinct problems with the reference counter method. First, if storage cells are relatively small, the space required for the counters is significant. Second, some execution time is obviously required to maintain the counter values. Every time a pointer value is changed, the cell to which it was pointing must have its counter decremented, and the cell to which it is now pointing must have its counter incremented. In a language like Lisp, in which nearly every action involves changing pointers, that can be a significant portion of the total execution time of a program. Of course, if pointer changes are not too frequent, this is not a problem. Some of the inefficiency of reference counters can be eliminated by an approach named deferred reference counting, which avoids reference counters for some pointers. The third problem is that complications arise when a collection of cells is connected circularly. The problem here is that each cell in the circular list has a reference counter value of at least 1, which prevents it from being collected and placed back on the list of available space. A solution to this problem can be found in Friedman and Wise (1979).
引用计数器方法的优点在于它本质上是增量的。它的操作与应用程序的操作交错,因此它永远不会导致应用程序执行的重大延迟。
The advantage of the reference counter approach is that it is intrinsically incremental. Its actions are interleaved with those of the application, so it never causes significant delays in the execution of the application.
垃圾收集的原始标记-清除过程如下:运行时系统根据请求分配存储单元,并根据需要断开单元指针,而不考虑存储回收(允许垃圾累积),直到分配完所有可用单元。此时,开始标记-清除过程以收集堆中剩余的所有垃圾。为了促进该过程,每个堆单元都有一个额外的指示位或字段,供收集算法使用。
The original mark-sweep process of garbage collection operates as follows: The run-time system allocates storage cells as requested and disconnects pointers from cells as necessary, without regard for storage reclamation (allowing garbage to accumulate), until it has allocated all available cells. At this point, a mark-sweep process is begun to gather all the garbage left floating around in the heap. To facilitate the process, every heap cell has an extra indicator bit or field that is used by the collection algorithm.
标记-清除过程由三个不同的阶段组成。首先,堆中的所有单元都将其指示器设置为指示它们是垃圾。当然,这只对部分单元是正确的假设。第二部分称为标记阶段,是最困难的。程序中的每个指针都会被跟踪到堆中,并且所有可到达的单元都会被标记为非垃圾。在此之后,执行第三阶段,称为清除阶段:堆中所有未被明确标记为仍在使用的单元都将返回到可用空间列表中。
The mark-sweep process consists of three distinct phases. First, all cells in the heap have their indicators set to indicate they are garbage. This is, of course, a correct assumption for only some of the cells. The second part, called the marking phase, is the most difficult. Every pointer in the program is traced into the heap, and all reachable cells are marked as not being garbage. After this, the third phase, called the sweep phase, is executed: All cells in the heap that have not been specifically marked as still being used are returned to the list of available space.
为了说明用于标记当前正在使用的单元的算法的风格,我们提供了以下标记算法的简单版本。我们假设所有堆动态变量或堆单元都由一个信息部分、一个名为的标记部分以及名为和的marker两个指针组成。这些单元用于构建有向图,每个节点最多有两条边。标记算法遍历图的所有生成树,标记找到的所有单元。与其他图遍历一样,标记算法使用递归。llinkrlink
To illustrate the flavor of algorithms used to mark the cells that are currently in use, we provide the following simple version of a marking algorithm. We assume that all heap-dynamic variables, or heap cells, consist of an information part; a part for the mark, named marker; and two pointers named llink and rlink. These cells are used to build directed graphs with at most two edges leading from any node. The marking algorithm traverses all spanning trees of the graphs, marking all cells that are found. Like other graph traversals, the marking algorithm uses recursion.
for every pointer r do
mark(r)
void mark(void * ptr) {
if (ptr != 0)
if (*ptr.marker is not marked) {
set *ptr.marker
mark(*ptr.llink)
mark(*ptr.rlink)
}
}
for every pointer r do
mark(r)
void mark(void * ptr) {
if (ptr != 0)
if (*ptr.marker is not marked) {
set *ptr.marker
mark(*ptr.llink)
mark(*ptr.rlink)
}
}
图 6.9显示了此过程在给定图上执行操作的示例。这种简单的标记算法需要大量存储空间(用于支持递归的堆栈空间)。Schorr和 Waite (1967)开发了一种不需要额外堆栈空间的标记过程。他们的方法在追踪链接结构时反转指针。然后,当到达列表末尾时,该过程可以沿着指针返回结构。
An example of the actions of this procedure on a given graph is shown in Figure 6.9. This simple marking algorithm requires a great deal of storage (for stack space to support recursion). A marking process that does not require additional stack space was developed by Schorr and Waite (1967). Their method reverses pointers as it traces out linked structures. Then, when the end of a list is reached, the process can follow the pointers back out of the structure.
标记-清除算法最初版本最严重的问题是它执行得太少——只有当程序使用了全部或几乎所有的堆存储。在这种情况下,标记-清除会花费大量时间,因为必须跟踪大多数单元并将其标记为当前正在使用。这会导致应用程序进度严重延迟。此外,该过程可能只产生少量可放置在可用空间列表中的单元。这个问题已在各种改进中得到解决。例如,增量标记-清除垃圾收集在内存耗尽之前更频繁地发生,从而使该过程在回收的存储量方面更有效。此外,每次运行该过程所需的时间明显更短,从而减少了应用程序执行的延迟。另一种选择是在不同时间对与应用程序相关的部分内存而不是所有内存执行标记-清除过程。这提供了与增量标记-清除相同类型的改进。
The most serious problem with the original version of mark-sweep was that it was done too infrequently—only when a program had used all or nearly all of the heap storage. Mark-sweep in that situation takes a good deal of time, because most of the cells must be traced and marked as being currently used. This causes a significant delay in the progress of the application. Furthermore, the process may yield only a small number of cells that can be placed on the list of available space. This problem has been addressed in a variety of improvements. For example, incremental mark-sweep garbage collection occurs more frequently, long before memory is exhausted, making the process more effective in terms of the amount of storage that is reclaimed. Also, the time required for each run of the process is obviously shorter, thus reducing the delay in application execution. Another alternative is to perform the mark-sweep process on parts, rather than all of the memory associated with the application, at different times. This provides the same kinds of improvements as incremental mark-sweep.
标记-清除方法的标记算法和参考计数器方法所需的过程都可以通过使用Suzuki (1982)描述的指针旋转和滑动操作来提高效率。
Both the marking algorithms for the mark-sweep method and the processes required by the reference counter method can be made more efficient by use of the pointer rotation and slide operations that are described by Suzuki (1982).
管理分配了可变大小单元8的堆具有管理单一大小单元的所有困难,而且还存在其他问题。不幸的是,大多数编程语言都需要可变大小的单元。可变大小单元管理带来的其他问题取决于所使用的方法。如果使用标记-清除,则会出现以下额外问题:
Managing a heap from which variable-size cells8 are allocated has all the difficulties of managing one for single-size cells, but also has additional problems. Unfortunately, variable-size cells are required by most programming languages. The additional problems posed by variable-size cell management depend on the method used. If mark-sweep is used, the following additional problems occur:
初始设置堆中所有单元的指示器以指示它们是垃圾是很困难的。由于单元大小不同,因此扫描它们是一个问题。一种解决方案是要求每个单元将单元大小作为其第一个字段。然后就可以进行扫描,尽管它比固定大小单元的对应项稍微多一点空间和多一点时间。
The initial setting of the indicators of all cells in the heap to indicate that they are garbage is difficult. Because the cells are different sizes, scanning them is a problem. One solution is to require each cell to have the cell size as its first field. Then the scanning can be done, although it takes slightly more space and somewhat more time than its counterpart for fixed-size cells.
标记过程并不简单。如果指针在指向的单元中没有预定义的位置,那么如何从指针开始跟踪链?根本不包含指针的单元也是一个问题。向每个单元添加一个内部指针(由运行时系统在后台维护)将可行。但是,这种后台维护处理会增加程序运行所需的空间和执行时间开销。
The marking process is nontrivial. How can a chain be followed from a pointer if there is no predefined location for the pointer in the pointed-to cell? Cells that do not contain pointers at all are also a problem. Adding an internal pointer to each cell, which is maintained in the background by the run-time system, will work. However, this background maintenance processing adds both space and execution time overhead to the cost of running the program.
维护可用空间列表是另一个开销来源。列表可以从包含所有可用空间的单个单元开始。对段的请求只是减少此块的大小。回收的单元将添加到列表中。问题是不久之后,列表就变成了一个由各种大小的段或块组成的长列表。这会减慢分配速度,因为请求会导致在列表中搜索足够大的块。最终,列表可能包含大量非常小的块,这些块对于大多数请求来说都不够大。此时,可能需要将相邻的块折叠成更大的块。使用列表中第一个足够大的块的替代方法可以缩短搜索时间,但需要按块大小对列表进行排序。无论哪种情况,维护列表都是额外的开销。
Maintaining the list of available space is another source of overhead. The list can begin with a single cell consisting of all available space. Requests for segments simply reduce the size of this block. Reclaimed cells are added to the list. The problem is that before long, the list becomes a long list of various-size segments, or blocks. This slows allocation because requests cause the list to be searched for sufficiently large blocks. Eventually, the list may consist of a large number of very small blocks, which are not large enough for most requests. At this point, adjacent blocks may need to be collapsed into larger blocks. Alternatives to using the first sufficiently large block on the list can shorten the search but require the list to be ordered by block size. In either case, maintaining the list is additional overhead.
如果使用引用计数器,可以避免前两个问题,但可用空间列表维护问题仍然存在。
If reference counters are used, the first two problems are avoided, but the available-space list-maintenance problem remains.
有关内存管理问题的全面研究,请参阅Wilson (2005)。
For a comprehensive study of memory management problems, see Wilson (2005).
在编程中,有时需要能够指示变量当前没有值。一些较旧的语言使用零作为数字变量的非值。这种方法的缺点是无法区分变量应该具有零值的情况和零表示它没有值的情况。一些较新的语言提供可以具有正常值或特殊值的类型来指示其变量没有值。具有此功能的变量称为可选类型。现在,C#、F# 和 Swift 等语言直接支持可选类型。
There are situations in programming when there is a need to be able to indicate that a variable does not currently have a value. Some older languages use zero as a nonvalue for numeric variables. This approach has the disadvantage of not being able to distinguish between when the variable is supposed to have the zero value and when the zero indicates that it has no value. Some newer languages provide types that can have a normal value or a special value to indicate that their variables have no value. Variables that have this capability are called optional types. Optional types are now directly supported in C#, F#, and Swift, among others.
C# 有两种类型的变量:值类型和引用类型。引用类型是类,本质上是可选类型。值null表示引用类型没有值。值类型是所有结构类型,可以声明为可选类型,这允许它们具有值null。通过在变量的类型名称后加上问号(?),可以将变量声明为可选类型,例如
C# has two categories of variables, value and reference types. Reference types, which are classes, are optional types by their nature. The null value indicates that a reference type has no value. Value types, which are all struct types, can be declared to be optional types, which allows them to have the value null. A variable is declared to be an optional type by following its type name with a question mark (?), as in
int? x;int? x;
要确定变量是否具有正常值,可以对其进行测试null,例如
To determine whether a variable has a normal value, it can be tested against null, as in
int? x;
. . .
if(x == null)
Console.WriteLine("x has no value");
else
Console.WriteLine("The value of x is: {0}", x);
int? x;
. . .
if(x == null)
Console.WriteLine("x has no value");
else
Console.WriteLine("The value of x is: {0}", x);
Swift 的可选类型与 C# 的类似,只是非值名为nil,而不是null。上述代码的 Swift 版本为:
Swift’s optional types are similar to those of C#, except that the nonvalue is named nil, instead of null. The Swift version of the above code is:
var Int? x;
. . .
if x == nil
print("x has no value")
else
print("The value of x is: \(x)")
var Int? x;
. . .
if x == nil
print("x has no value")
else
print("The value of x is: \(x)")在我们讨论类型检查时,操作数和运算符的概念被概括为包括子程序和赋值语句。子程序将被视为以操作数作为参数的运算符。赋值符号将被视为二元运算符,其目标变量和表达式是操作数。
For our discussion of type checking, the concept of operands and operators is generalized to include subprograms and assignment statements. Subprograms will be thought of as operators whose operands are their parameters. The assignment symbol will be thought of as a binary operator, with its target variable and its expression being the operands.
类型检查是确保运算符的操作数属于兼容类型的活动。兼容类型是指对于运算符而言合法的类型,或者根据语言规则允许由编译器生成的代码(或解释器)隐式转换为合法类型的类型。这种自动转换称为强制。例如,如果在 Java 中将一个int变量和一个变量相加,则将强制转换为变量float的值,并进行浮点加法。intfloat
Type checking is the activity of ensuring that the operands of an operator are of compatible types. A compatible type is one that either is legal for the operator or is allowed under language rules to be implicitly converted by compiler-generated code (or the interpreter) to a legal type. This automatic conversion is called a coercion. For example, if an int variable and a float variable are added in Java, the value of the int variable is coerced to float and a floating-point add is done.
类型错误是指将运算符应用于类型不适当的操作数。例如,在 C 的原始版本中,如果将int值传递给需要值的函数float,则会发生类型错误(因为该语言的编译器不会检查参数的类型)。
A type error is the application of an operator to an operand of an inappropriate type. For example, in the original version of C, if an int value was passed to a function that expected a float value, a type error would occur (because compilers for that language did not check the types of parameters).
如果语言中所有变量到类型的绑定都是静态的,那么类型检查几乎总是可以静态完成的。动态类型绑定需要在运行时进行类型检查,这称为动态类型检查。
If all bindings of variables to types are static in a language, then type checking can nearly always be done statically. Dynamic type binding requires type checking at run time, which is called dynamic type checking.
有些语言(例如 JavaScript 和 PHP)由于其动态类型绑定,仅允许动态类型检查。最好在编译时检测错误,而不是在运行时检测错误,因为越早更正错误,成本通常越低。静态检查的缺点是降低了程序员的灵活性。可用的捷径和技巧更少。然而,现在普遍认为这些技术容易出错,并且不利于可读性。
Some languages, such as JavaScript and PHP, because of their dynamic type binding, allow only dynamic type checking. It is better to detect errors at compile time than at run time, because the earlier correction is usually less costly. The penalty for static checking is reduced programmer flexibility. Fewer shortcuts and tricks are possible. Such techniques, though, are now generally recognized to be error prone and detrimental to readability.
当一种语言允许内存单元在执行期间的不同时间存储不同类型的值时,类型检查会变得很复杂。可以使用 C 和 C++ 联合以及 ML、Haskell 和 F# 的可区分联合来创建此类内存单元。在这些情况下,如果进行类型检查,则必须是动态的,并且需要运行时系统维护此类内存单元当前值的类型。因此,即使所有变量都静态绑定到 C++ 等语言中的类型,但并非所有类型错误都可以通过静态类型检查检测到。
Type checking is complicated when a language allows a memory cell to store values of different types at different times during execution. Such memory cells can be created with C and C++ unions and the discriminated unions of ML, Haskell, and F#. In these cases, type checking, if done, must be dynamic and requires the run-time system to maintain the type of the current value of such memory cells. So, even though all variables are statically bound to types in languages such as C++, not all type errors can be detected by static type checking.
在 20 世纪 70 年代所谓的结构化编程革命中,语言设计中一个突出的思想就是强类型。强类型被广泛认为是一种非常有价值的语言特性。不幸的是,它的定义往往很松散,有时在计算文献中使用它时根本没有定义。
One of the ideas in language design that became prominent in the so-called structured-programming revolution of the 1970s was strong typing. Strong typing is widely acknowledged as being a highly valuable language characteristic. Unfortunately, it is often loosely defined, and it is sometimes used in computing literature without being defined at all.
如果始终能检测到类型错误,则编程语言是强类型的。这要求能够在编译时或运行时确定所有操作数的类型。强类型的重要性在于它能够检测到导致类型错误的所有变量误用。强类型语言还允许在运行时检测在可以存储多种类型的值的变量中使用不正确的类型值。
A programming language is strongly typed if type errors are always detected. This requires that the types of all operands can be determined, either at compile time or at run time. The importance of strong typing lies in its ability to detect all misuses of variables that result in type errors. A strongly typed language also allows the detection, at run time, of uses of the incorrect type values in variables that can store values of more than one type.
C 和 C++ 不是强类型语言,因为它们都包含联合类型,而联合类型未经类型检查。
C and C++ are not strongly typed languages because both include union types, which are not type checked.
ML 是强类型的,即使在编译时可能不知道某些函数参数的类型。F# 是强类型的。
ML is strongly typed, even though the types of some function parameters may not be known at compile time. F# is strongly typed.
Java 和 C# 虽然基于 C++,但几乎都是强类型的。类型可以显式转换,这可能会导致类型错误。但是,没有隐式方法可以不检测类型错误。
Java and C#, although they are based on C++, are nearly strongly typed. Types can be explicitly cast, which could result in a type error. However, there are no implicit ways type errors can go undetected.
语言的强制规则对类型检查的值有重要影响。例如,在 Java 中表达式是强类型的。但是,具有一个浮点操作数和一个整数操作数的算术运算符是合法的。整数操作数的值被强制转换为浮点数,并进行浮点运算。这通常是程序员想要的。但是,强制也会导致强类型的优势之一的丧失——错误检测。例如,假设一个程序有变量和int和a变量b。float现在d,如果程序员想要输入a + b,但错误地输入了a + d,则编译器将无法检测到错误。的值a将被强制转换为float。因此,强类型的价值因强制而减弱。具有大量强制的语言(如 C 和 C++)不如没有强制的语言(如 ML 和 F#)可靠。 Java 和 C# 的赋值类型强制转换数量只有 C++ 的一半,因此它们的错误检测比 C++ 更好,但仍然不如 ML 和 F# 有效。第7章 详细讨论了强制转换问题。
The coercion rules of a language have an important effect on the value of type checking. For example, expressions are strongly typed in Java. However, an arithmetic operator with one floating-point operand and one integer operand is legal. The value of the integer operand is coerced to floating-point, and a floating-point operation takes place. This is what is usually intended by the programmer. However, the coercion also results in a loss of one of the benefits of strong typing—error detection. For example, suppose a program had the int variables a and b and the float variable d. Now, if a programmer meant to type a + b, but mistakenly typed a + d, the error would not be detected by the compiler. The value of a would simply be coerced to float. So, the value of strong typing is weakened by coercion. Languages with a great deal of coercion, like C, and C++, are less reliable than those with no coercion, such as ML and F#. Java and C# have half as many assignment type coercions as C++, so their error detection is better than that of C++, but still not nearly as effective as that of ML and F#. The issue of coercion is examined in detail in Chapter 7.
类型兼容性的概念是在引入类型检查问题时定义的。兼容性规则规定了每个运算符可接受的操作数类型,从而指定了语言可能出现的类型错误。9 这些规则之所以被称为兼容性,是因为在某些情况下,编译器或运行时系统可以隐式转换操作数的类型,使其可以被运算符接受。
The idea of type compatibility was defined when the issue of type checking was introduced. The compatibility rules dictate the types of operands that are acceptable for each of the operators and thereby specify the possible type errors of the language.9 The rules are called compatibility because in some cases the type of an operand can be implicitly converted by the compiler or run-time system to make it acceptable for the operator.
对于预定义的标量类型,类型兼容性规则简单而严格。但是,对于结构化类型(例如数组和记录以及某些用户定义类型),规则则更为复杂。这些类型的强制转换很少见,因此问题不是类型兼容性,而是类型等价性。也就是说,如果表达式中一种类型的操作数可以替换为另一种类型的操作数,而无需强制转换,则两种类型是等价的。类型等价性是类型兼容性的一种严格形式——无需强制转换即可兼容。这里的核心问题是如何定义类型等价性。
The type compatibility rules are simple and rigid for the predefined scalar types. However, in the cases of structured types, such as arrays and records and some user-defined types, the rules are more complex. Coercion of these types is rare, so the issue is not type compatibility, but type equivalence. That is, two types are equivalent if an operand of one type in an expression can be substituted for one of the other type, without coercion. Type equivalence is a strict form of type compatibility—compatibility without coercion. The central issue here is how type equivalence is defined.
语言的类型等价规则的设计很重要,因为它会影响数据类型的设计以及为这些类型的值提供的操作。对于这里讨论的类型,预定义的操作很少。两个变量等价类型的最重要结果可能是其中一个变量的值可以赋给另一个变量。
The design of the type equivalence rules of a language is important, because it influences the design of the data types and the operations provided for values of those types. With the types discussed here, there are very few predefined operations. Perhaps the most important result of two variables being of equivalent types is that either one can have its value assigned to the other.
定义类型等价有两种方法:名称类型等价和结构类型等价。名称类型等价意味着如果两个变量在同一声明中定义,或者在使用相同类型名称的声明中定义,则它们具有等价类型。结构类型等价意味着如果两个变量的类型具有相同的结构,则它们具有等价类型。这两种方法有一些变体,许多语言都使用它们的组合。
There are two approaches to defining type equivalence: name type equivalence and structure type equivalence. Name type equivalence means that two variables have equivalent types if they are defined either in the same declaration or in declarations that use the same type name. Structure type equivalence means that two variables have equivalent types if their types have identical structures. There are some variations of these two approaches, and many languages use combinations of them.
名称类型等价性很容易实现,但限制性更强。在严格的解释下,类型为整数子范围的变量不等同于整数类型变量。例如,假设 Ada 使用严格的名称类型等价性,请考虑以下 Ada 代码:
Name type equivalence is easy to implement but is more restrictive. Under a strict interpretation, a variable whose type is a subrange of the integers would not be equivalent to an integer type variable. For example, supposing Ada used strict name type equivalence, consider the following Ada code:
type Indextype is 1..100;
count : Integer;
index : Indextype;
type Indextype is 1..100;
count : Integer;
index : Indextype;
变量count和的类型index不等同;count无法被赋值index,反之亦然。
The types of the variables count and index would not be equivalent; count could not be assigned to index or vice versa.
当结构化或用户定义类型通过参数在子程序之间传递时,会出现名称类型等价的另一个问题。这种类型只能全局定义一次。子程序不能以本地术语来陈述此类形式参数的类型。Pascal 的原始版本就是这种情况。
Another problem with name type equivalence arises when a structured or user-defined type is passed among subprograms through parameters. Such a type must be defined only once, globally. A subprogram cannot state the type of such formal parameters in local terms. This was the case with the original version of Pascal.
请注意,要使用名称类型等价,所有类型都必须有名称。大多数语言允许用户定义匿名类型 - 它们没有名称。对于要使用名称类型等价的语言,编译器必须隐式地为此类类型赋予内部名称。
Note that to use name type equivalence, all types must have names. Most languages allow users to define types that are anonymous—they do not have names. For a language to use name type equivalence, such types must implicitly be given internal names by the compiler.
结构类型等价比名称类型等价更灵活,但实现起来更困难。在名称类型等价下,只需比较两个类型名称即可确定等价。而在结构类型等价下,必须比较两种类型的整个结构。这种比较并不总是简单的。(考虑一个引用其自身类型的数据结构,如链表。)还可能出现其他问题。例如,struct如果两个记录(或)类型具有相同的结构但不同的字段名称,它们是否等价?在一种允许在其声明中设置数组下标范围下限的语言中,如果两个一维数组类型具有相同的元素类型但下标范围为和,它们是否0..10等价1..11?如果两个枚举类型具有相同数量的组件但文字的拼写不同,它们是否等价?
Structure type equivalence is more flexible than name type equivalence, but it is more difficult to implement. Under name type equivalence, only the two type names must be compared to determine equivalence. Under structure type equivalence, however, the entire structures of the two types must be compared. This comparison is not always simple. (Consider a data structure that refers to its own type, such as a linked list.) Other questions can also arise. For example, are two record (or struct) types equivalent if they have the same structure but different field names? Are two single-dimensioned array types in a language that allows lower bounds of array subscript ranges to be set in their declarations equivalent if they have the same element type but have subscript ranges of 0..10 and 1..11? Are two enumeration types equivalent if they have the same number of components but spell the literals differently?
结构类型等价性的另一个难点是它不允许区分具有相同结构的类型。例如,考虑以下类似 Ada 的声明:
Another difficulty with structure type equivalence is that it disallows differentiating between types with the same structure. For example, consider the following Ada-like declarations:
type Celsius = Float;
Fahrenheit = Float;
type Celsius = Float;
Fahrenheit = Float;
在结构类型等价性下,这两种类型的变量类型被认为是等价的,允许它们在表达式中混合,但考虑到类型名称所指示的差异,在这种情况下这肯定是不可取的。通常,名称不同的类型可能是不同类别问题值的抽象,不应被视为等价。
The types of variables of these two types are considered equivalent under structure type equivalence, allowing them to be mixed in expressions, which is surely undesirable in this case, considering the difference indicated by the type’s names. In general, types with different names are likely to be abstractions of different categories of problem values and should not be considered equivalent.
Ada 使用名称类型等价的限制形式,但提供了两种类型构造,即子类型和派生类型,以避免与名称类型等价相关的问题。派生类型是一种基于某些先前定义的类型的新类型,虽然它们可能具有相同的结构,但并不等价。派生类型继承了其父类型的所有属性。请考虑以下示例:
Ada uses a restrictive form of name type equivalence but provides two type constructs, subtypes and derived types, that avoid the problems associated with name type equivalence. A derived type is a new type that is based on some previously defined type with which it is not equivalent, although it may have identical structure. Derived types inherit all the properties of their parent types. Consider the following example:
type Celsius is new Float;
type Fahrenheit is new Float;
type Celsius is new Float;
type Fahrenheit is new Float;
这两种派生类型的变量类型并不等价,尽管它们的结构相同。此外,这两种类型的变量都不与任何其他浮点类型等价。文字不受此规则限制。文字具有3.0通用实数类型,并且与任何浮点类型等价。派生类型还可以包括对父类型的范围约束,同时仍继承父类型的所有操作。
The types of variables of these two derived types are not equivalent, although their structures are identical. Furthermore, variables of neither type is type equivalent with any other floating-point type. Literals are exempt from the rule. A literal such as 3.0 has the type universal real and is type equivalent to any floating-point type. Derived types can also include range constraints on the parent type, while still inheriting all of the parent’s operations.
Ada 子类型是现有类型的可能范围受限版本。子类型与其父类型类型等价。例如,考虑以下声明:
An Ada subtype is a possibly range-constrained version of an existing type. A subtype is type equivalent with its parent type. For example, consider the following declaration:
subtype Small_type is Integer range 0..99;subtype Small_type is Integer range 0..99;
该类型Small_type等同于类型Integer。
The type Small_type is equivalent to the type Integer.
请注意,Ada 的派生类型与 Ada 的子范围类型有很大不同。例如,考虑以下类型声明:
Note that Ada’s derived types are very different from Ada’s subrange types. For example, consider the following type declarations:
type Derived_Small_Int is new Integer range 1..100;
subtype Subrange_Small_Int is Integer range 1..100;
type Derived_Small_Int is new Integer range 1..100;
subtype Subrange_Small_Int is Integer range 1..100;
Derived_Small_Int和两种类型的变量Subrange_Small_Int具有相同的合法值范围,并且都继承了 的操作Integer。但是, 类型的变量Derived_Small_Int与任何类型都不兼容Integer。另一方面, 类型的变量Subrange_Small_Int与 类型的变量和常量Integer以及 的任何子类型兼容Integer。
Variables of both types, Derived_Small_Int and Subrange_Small_Int, have the same range of legal values and both inherit the operations of Integer. However, variables of type Derived_Small_Int are not compatible with any Integer type. On the other hand, variables of type Subrange_Small_Int are compatible with variables and constants of Integer type and any subtype of Integer.
对于 Ada 无约束数组类型的变量,使用结构类型等价性。例如,考虑以下类型声明和两个对象声明:
For variables of an Ada unconstrained array type, structure type equivalence is used. For example, consider the following type declaration and two object declarations:
type Vector is array (Integer range <>) of Integer;
Vector_1: Vector (1..10);
Vector_2: Vector (11..20);
type Vector is array (Integer range <>) of Integer;
Vector_1: Vector (1..10);
Vector_2: Vector (11..20);
这两个对象的类型是等价的,尽管它们有不同的名称和不同的下标范围,因为对于无约束数组类型的对象,使用结构类型等价而不是名称类型等价。由于两种类型都有 10 个元素,并且两者的元素都是 类型Integer,因此它们是类型等价的。
The types of these two objects are equivalent, even though they have different names and different subscript ranges, because for objects of unconstrained array types, structure type equivalence rather than name type equivalence is used. Because both types have 10 elements and the elements of both are of type Integer, they are type equivalent.
对于受约束的匿名类型,Ada 使用高度限制的名称类型等价形式。请考虑以下受约束的匿名类型的 Ada 声明:
For constrained anonymous types, Ada uses a highly restrictive form of name type equivalence. Consider the following Ada declarations of constrained anonymous types:
A : array (1..10) of Integer;A : array (1..10) of Integer;
在这种情况下,A具有由编译器分配的匿名但唯一的类型,并且程序无法使用。如果我们还有
In this case, A has an anonymous but unique type assigned by the compiler and unavailable to the program. If we also had
B : array (1..10) of Integer;B : array (1..10) of Integer;
A和B将是匿名的,但不同的,并且不是等效的类型,尽管它们在结构上是相同的。多重声明
A and B would be of anonymous but distinct and not equivalent types, though they are structurally identical. The multiple declaration
C, D : array (1..10) of Integer;C, D : array (1..10) of Integer;
创建两个匿名类型,一个为C,一个为D,它们不等效。此声明实际上被视为以下两个声明:
creates two anonymous types, one for C and one for D, which are not equivalent. This declaration is actually treated as if it were the following two declarations:
C : array (1..10) of Integer;
D : array (1..10) of Integer;
C : array (1..10) of Integer;
D : array (1..10) of Integer;
请注意,Ada 中的名称类型等价形式比本节开头定义的名称类型等价形式更具限制性。如果我们改写为
Note that Ada’s form of name type equivalence is more restrictive than the name type equivalence that is defined at the beginning of this section. If we had written instead
type List_10 is array (1..10) of Integer;
C, D : List_10;
type List_10 is array (1..10) of Integer;
C, D : List_10;
C那么和的类型D将是等价的。
then the types of C and D would be equivalent.
名称类型等价性对于 Ada 来说效果很好,部分原因是除了匿名数组之外的所有类型都需要有类型名称(并且匿名类型由编译器赋予内部名称)。
Name type equivalence works well for Ada, in part because all types, except anonymous arrays, are required to have type names (and anonymous types are given internal names by the compiler).
Ada 的类型等价规则比那些在类型间有多种强制转换的语言的规则更为严格。例如,Java 中加法运算符的两个操作数实际上可以是该语言中任意数字类型的组合。其中一个操作数将简单地强制转换为另一个操作数的类型。但在 Ada 中,算术运算符的操作数没有强制转换。
Type equivalence rules for Ada are more rigid than those for languages that have many coercions among types. For example, the two operands of an addition operator in Java can have virtually any combination of numeric types in the language. One of the operands will simply be coerced to the type of the other. But in Ada, there are no coercions of the operands of an arithmetic operator.
C 同时使用名称和结构类型等价。每个struct、enum和union声明都会创建一个不等价于任何其他类型的新类型。因此,名称类型等价用于结构、枚举和联合类型。其他非标量类型使用结构类型等价。如果数组类型具有相同的类型组件,则它们是等价的。此外,如果数组类型具有常量大小,则它等价于具有相同常量大小的其他数组或没有常量大小的数组。请注意,typedef在 C 和 C++ 中, 不会引入新类型;它只是为现有类型定义一个新名称。因此,任何用 定义的类型typedef都等价于其父类型。C 对结构、枚举和联合使用名称类型等价的一个例外是,如果两个结构、枚举或联合在不同的文件中定义,在这种情况下将使用结构类型等价。这是名称类型等价规则中的一个漏洞,允许在不同文件中定义的结构、枚举和联合等价。
C uses both name and structure type equivalence. Every struct, enum, and union declaration creates a new type that is not equivalent to any other type. So, name type equivalence is used for structure, enumeration, and union types. Other nonscalar types use structure type equivalence. Array types are equivalent if they have the same type components. Also, if an array type has a constant size, it is equivalent either to other arrays with the same constant size or to with those without a constant size. Note that typedef in C and C++ does not introduce a new type; it simply defines a new name for an existing type. So, any type defined with typedef is type equivalent to its parent type. One exception to C using name type equivalence for structures, enumerations, and unions is if two structures, enumerations, or unions are defined in different files, in which case structural type equivalence is used. This is a loophole in the name type equivalence rule to allow equivalence of structures, enumerations, and unions that are defined in different files.
C++ 与 C 类似,只是在不同文件中定义的结构和联合没有例外。
C++ is like C except there is no exception for structures and unions defined in different files.
在那些不允许用户定义和命名类型的语言中,例如 Fortran 和 COBOL,名称等价显然不能使用。
In languages that do not allow users to define and name types, such as Fortran and COBOL, name equivalence obviously cannot be used.
Java 和 C++ 等面向对象语言带来了另一种类型兼容性问题。该问题是对象兼容性及其与继承层次结构的关系,这将在第12章 中讨论。
Object-oriented languages such as Java and C++ bring another kind of type compatibility issue with them. The issue is object compatibility and its relationship to the inheritance hierarchy, which is discussed in Chapter 12.
表达式中的类型兼容性在第7章 中讨论;子程序参数的类型兼容性在第9章 中讨论。
Type compatibility in expressions is discussed in Chapter 7; type compatibility for subprogram parameters is discussed in Chapter 9.
类型理论是数学、逻辑、计算机科学和哲学中一个广泛的研究领域。它始于 20 世纪初的数学,后来成为逻辑学中的标准工具。任何关于类型理论的一般性讨论都必然是复杂、冗长且高度抽象的。即使仅限于计算机科学,类型理论也包括各种复杂的主题,如类型化 lambda 演算、组合子、有界量化的元理论、存在类型和高阶多态性。所有这些主题都远远超出了本书的范围。
Type theory is a broad area of study in mathematics, logic, computer science, and philosophy. It began in mathematics in the early 1900s and later became a standard tool in logic. Any general discussion of type theory is necessarily complex, lengthy, and highly abstract. Even when restricted to computer science, type theory includes such diverse and complex subjects as typed lambda calculus, combinators, the metatheory of bounded quantification, existential types, and higher-order polymorphism. All these topics are far beyond the scope of this book.
在计算机科学中,类型理论有两个分支:实用和抽象。实用分支涉及商业编程语言中的数据类型;抽象分支主要关注类型化 lambda 演算,这是理论计算机科学家在过去半个世纪中广泛研究的一个领域。本节仅限于简要描述编程语言中数据类型所依赖的一些数学形式。
In computer science there are two branches of type theory: practical and abstract. The practical branch is concerned with data types in commercial programming languages; the abstract branch primarily focuses on typed lambda calculus, an area of extensive research by theoretical computer scientists over the past half century. This section is restricted to a brief description of some of the mathematical formalisms that underlie data types in programming languages.
数据类型定义一组值和一组针对这些值的操作。类型系统是一组类型和控制它们在程序中的使用的规则。显然,每种类型化编程语言都定义了一个类型系统。编程语言类型系统的形式化模型由一组类型和一组定义语言类型规则的函数组成,这些规则用于确定任何表达式的类型。第3章 介绍了描述类型系统规则的形式化系统,即属性语法。
A data type defines a set of values and a collection of operations on those values. A type system is a set of types and the rules that govern their use in programs. Obviously, every typed programming language defines a type system. The formal model of a type system of a programming language consists of a set of types and a collection of functions that define the type rules of the language, which are used to determine the type of any expression. A formal system that describes the rules of a type system, attribute grammars, is introduced in Chapter 3.
属性语法的替代模型使用类型映射和一组与语法规则无关、用于指定类型规则的函数。类型映射类似于指称语义中使用的程序状态,由一组有序对组成,每对的第一个元素是变量的名称,第二个元素是其类型。类型映射是使用程序中的类型声明构建的。在静态类型语言中,类型映射只需在编译期间维护,但它会随着编译器分析程序而发生变化。如果任何类型检查都是动态完成的,则必须在执行期间维护类型映射。编译系统中类型映射的具体版本是符号表,主要由词法和语法分析器构建。动态类型有时通过附加到值或对象的标签来维护。
An alternative model to attribute grammars uses a type map and a collection of functions, not associated with grammar rules, that specify the type rules. A type map is similar to the state of a program used in denotational semantics, consisting of a set of ordered pairs, with the first element of each pair being a variable’s name and the second element being its type. A type map is constructed using the type declarations in the program. In a static typed language, the type map need only be maintained during compilation, although it changes as the program is analyzed by the compiler. If any type checking is done dynamically, the type map must be maintained during execution. The concrete version of a type map in a compilation system is the symbol table, constructed primarily by the lexical and syntax analyzers. Dynamic types sometimes are maintained with tags attached to values or objects.
如前所述,数据类型是一组值,尽管数据类型中的元素通常是有序的。例如,所有枚举类型中的元素都是有序的。然而,在数学集合中,元素不是有序的。尽管存在这种差异,集合运算仍可用于数据类型来描述新的数据类型。编程语言的结构化数据类型由类型运算符或与集合运算相对应的构造函数定义。以下段落简要介绍这些集合运算/类型构造函数。
As stated previously, a data type is a set of values, although in a data type the elements are often ordered. For example, the elements in all enumeration types are ordered. However, in a mathematical set the elements are not ordered. Despite this difference, set operations can be used on data types to describe new data types. The structured data types of programming languages are defined by type operators, or constructors that correspond to set operations. These set operations/type constructors are briefly introduced in the following paragraphs.
有限映射是从有限值集(定义域集)到值域集的函数。有限映射模拟了编程语言中的两种不同类型:函数和数组,尽管在某些语言中函数不是类型。所有语言都包含数组,数组是根据将索引映射到数组中元素的映射函数定义的。对于传统数组,映射很简单 - 将整数值映射到数组元素的地址;对于关联数组,映射由描述散列操作的函数定义。散列函数将关联数组的键(通常是字符串)10映射到数组元素的地址。
A finite mapping is a function from a finite set of values, the domain set, onto values in the range set. Finite mappings model two different categories of types in programming languages, functions and arrays, although in some languages functions are not types. All languages include arrays, which are defined in terms of a mapping function that maps indices to elements in the array. For traditional arrays, the mapping is simple—integer values are mapped to the addresses of array elements; for associative arrays, the mapping is defined by a function that describes a hashing operation. The hashing function maps the keys of the associate arrays, usually character strings,10 to the addresses of the array elements.
笛卡尔,或 n 个集合的叉积, 是一个集合,表示 笛卡尔积集的每个元素都有来自每个组成集的一个元素。因此, 例如,如果 和 笛卡尔积定义了数学中的元组,它们在 Python、ML、Swift 和 F# 中作为数据类型出现(参见第6.5节 )。笛卡尔积也模拟记录或结构,尽管并不完全一样。笛卡尔积没有元素名称,但记录需要它们。例如,考虑以下 C 结构:
A Cartesian, or cross product of n sets, is a set denoted Each element of the Cartesian product set has one element from each of the constituent sets. So, For example, if and A Cartesian product defines tuples in mathematics, which appear in Python, ML, Swift, and F# as a data type (see Section 6.5). Cartesian products also model records, or structs, although not exactly. Cartesian products do not have element names, but records require them. For example, consider the following C struct:
struct intFloat {
int myInt;
float myFloat;
};
struct intFloat {
int myInt;
float myFloat;
};
此结构定义笛卡尔积类型int
float. 元素的名称为myInt和myFloat。
This struct defines the Cartesian product type int
float. The names of the elements are myInt and myFloat.
两个集合的并集, 和 定义为 集合联合模型采用联合数据类型,如第6.10节 所述。
The union of two sets, and is defined as Set union models the union data types, as described in Section 6.10.
数学子集的定义方式是提供元素必须遵循的规则。这些集合模拟了 Ada 的子类型,尽管并不完全如此,因为子类型必须由其父集的连续元素组成。数学集合的元素是无序的,因此该模型并不完美。
Mathematical subsets are defined by providing a rule that elements must follow. These sets model the subtypes of Ada, although not exactly, because subtypes must consist of contiguous elements of their parent sets. Elements of mathematical sets are unordered, so the model is not perfect.
请注意,用类型运算符定义的指针(例如 C 中的星号)不是根据集合运算定义的。
Notice that pointers, defined with type operators, such as the asterisk in C, are not defined in terms of a set operation.
这结束了我们对数据类型形式主义的讨论,以及对数据类型的整个讨论。
This concludes our discussion of formalisms in data types, as well as our whole discussion of data types.
语言的数据类型在很大程度上决定了该语言的风格和实用性。数据类型与控制结构一起构成了语言的核心。
The data types of a language are a large part of what determines that language’s style and usefulness. Along with control structures, they form the heart of a language.
大多数命令式语言的原始数据类型包括数字、字符和布尔类型。数字类型通常由硬件直接支持。
The primitive data types of most imperative languages include numeric, character, and Boolean types. The numeric types are often directly supported by hardware.
用户定义的枚举和子范围类型非常方便,增加了程序的可读性和可靠性。
The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs.
数组是大多数编程语言的一部分。数组元素的引用与该元素的地址之间的关系在访问函数中给出,访问函数是映射的实现。数组可以是静态的,如 C++ 数组,其定义包括说明符static;可以是固定堆栈动态的,如 C 函数(没有static说明符);可以是固定堆动态的,如 Java 的对象;也可以是堆动态的,如 Perl 的数组。大多数语言只允许对完整数组进行少数操作。
Arrays are part of most programming languages. The relationship between a reference to an array element and the address of that element is given in an access function, which is an implementation of a mapping. Arrays can be either static, as in C++ arrays whose definition includes the static specifier; fixed stack-dynamic, as in C functions (without the static specifier); fixed heap-dynamic, as with Java’s objects; or heap dynamic, as in Perl’s arrays. Most languages allow only a few operations on complete arrays.
现在大多数语言都包含记录。记录字段的指定方式多种多样。在 COBOL 中,无需命名所有封闭记录即可引用它们,尽管这很难实现并且不利于可读性。在支持面向对象编程的几种语言中,记录由对象支持。
Records are now included in most languages. Fields of records are specified in a variety of ways. In the case of COBOL, they can be referenced without naming all of the enclosing records, although this is messy to implement and harmful to readability. In several languages that support object-oriented programming, records are supported with objects.
元组与记录类似,但其组成部分没有名称。它们是 Python、ML 和 F# 的一部分。
Tuples are similar to records, but do not have names for their constituent. parts. They are part of Python, ML, and F#.
列表是函数式编程语言的主要内容,但现在也包含在 Python 和 C# 中。
Lists are staples of the functional programming languages, but are now also included in Python and C#.
联合是可以在不同时间存储不同类型值的位置。可区分联合包含一个标记来记录当前类型值。自由联合是没有标记的联合。大多数具有联合的语言都没有安全的设计,ML、Swift 和 F# 除外。
Unions are locations that can store different type values at different times. Discriminated unions include a tag to record the current type value. A free union is one without the tag. Most languages with unions do not have safe designs for them, the exceptions being ML, Swift, and F#.
指针用于寻址灵活性和控制动态存储管理。指针有一些固有的危险:悬垂指针难以避免,并且可能发生内存泄漏。
Pointers are used for addressing flexibility and to control dynamic storage management. Pointers have some inherent dangers: Dangling pointers are difficult to avoid, and memory leakage can occur.
引用类型(例如 Java 和 C# 中的引用类型)提供堆管理,而没有指针的危险。
Reference types, such as those in Java and C#, provide heap management without the dangers of pointers.
枚举和记录类型相对容易实现。数组也很简单,尽管当数组有多个下标时,数组元素的访问是一个昂贵的过程。访问函数需要对每个下标进行一次加法和一次乘法。
Enumeration and record types are relatively easy to implement. Arrays are also straightforward, although array element access is an expensive process when the array has several subscripts. The access function requires one addition and one multiplication for each subscript.
如果不考虑堆管理,指针相对容易实现。如果所有单元的大小相同,堆管理很容易,但可变大小单元的分配和释放则很复杂。
Pointers are relatively easy to implement, if heap management is not considered. Heap management is easy if all cells have the same size but is complicated for variable-size cell allocation and deallocation.
可选类型变量是允许存储非值的变量。这允许程序指示变量当前没有值的情况。
Optional type variables are variables that allow a nonvalue to be stored. This allows a program to indicate when a variable currently has no value.
强类型化的概念是要求检测所有类型的错误。强类型的价值在于提高可靠性。
Strong typing is the concept of requiring that all type errors be detected. The value of strong typing is increased reliability.
语言的类型等价规则决定了语言的结构化类型之间哪些操作是合法的。名称类型等价和结构类型等价是定义类型等价的两种基本方法。
The type equivalence rules of a language determine what operations are legal among the structured types of a language. Name type equivalence and structure type equivalence are the two fundamental approaches to defining type equivalence.
类型理论已在许多领域得到发展。在计算机科学中,类型理论的实用分支定义了编程语言的类型和类型规则。集合论可用于对编程语言中的大多数结构化数据类型进行建模。
Type theories have been developed in many areas. In computer science, the practical branch of type theory defines the types and type rules of programming languages. Set theory can be used to model most of the structured data types in programming languages.
有大量文献涉及数据类型的设计、使用和实现。Hoare 在Dahl 等人 (1972) 的著作中给出了结构化类型的最早系统定义之一。Cleaveland (1986)对各种数据类型进行了一般性讨论。
A wealth of literature exists that is concerned with data type design, use, and implementation. Hoare gives one of the earliest systematic definitions of structured types in Dahl et al. (1972). A general discussion of a wide variety of data types is given in Cleaveland (1986).
Fischer 和 LeBlanc (1980)讨论了如何实现对 Pascal 数据类型可能存在的不安全性的运行时检查。大多数编译器设计书籍,如Fischer 和 LeBlanc (1991)和 Aho 等人 (1986),都描述了数据类型的实现方法,其他编程语言文本也是如此,如Pratt 和 Zelkowitz (2001)和Scott (2009)。有关堆管理问题的详细讨论,请参见Tenenbaum 等人 (1990) 。垃圾收集方法由Schorr 和 Waite (1967)以及Deutsch 和 Bobrow (1976)开发。有关垃圾收集算法的全面讨论,请参见Cohen (1981)和Wilson (2005)。
Implementing run-time checks on the possible insecurities of Pascal data types is discussed in Fischer and LeBlanc (1980). Most compiler design books, such as Fischer and LeBlanc (1991) and Aho et al. (1986), describe implementation methods for data types, as do the other programming language texts, such as Pratt and Zelkowitz (2001) and Scott (2009). A detailed discussion of the problems of heap management can be found in Tenenbaum et al. (1990). Garbage-collection methods are developed by Schorr and Waite (1967) and Deutsch and Bobrow (1976). A comprehensive discussion of garbage-collection algorithms can be found in Cohen (1981) and Wilson (2005).
什么是描述符?
What is a descriptor?
十进制数据类型的优点和缺点是什么?
What are the advantages and disadvantages of decimal data types?
字符串类型的设计问题是什么?
What are the design issues for character string types?
描述三个字符串长度选项。
Describe the three string length options.
定义序数、枚举和子范围类型。
Define ordinal, enumeration, and subrange types.
用户定义枚举类型有什么优点?
What are the advantages of user-defined enumeration types?
C# 的用户定义枚举类型在哪些方面比 C++ 的更可靠?
In what ways are the user-defined enumeration types of C# more reliable than those of C++?
阵列的设计问题是什么?
What are the design issues for arrays?
定义静态、固定堆栈动态、固定堆动态和堆动态数组。每个数组有什么优点?
Define static, fixed stack-dynamic, fixed heap-dynamic, and heap-dynamic arrays. What are the advantages of each?
当在 Perl 中引用数组中不存在的元素时会发生什么?
What happens when a nonexistent element of an array is referenced in Perl?
JavaScript 如何支持稀疏数组?
How does JavaScript support sparse arrays?
哪些语言支持负下标?
What languages support negative subscripts?
哪些语言支持具有步长数组切片?
What languages support array slices with stepsizes?
什么是聚合常数?
What is an aggregate constant?
定义行主顺序和列主顺序。
Define row major order and column major order.
数组的访问函数是什么?
What is an access function for an array?
Java 数组描述符中必需的条目有哪些,以及必须何时存储它们(在编译时还是运行时)?
What are the required entries in a Java array descriptor, and when must they be stored (at compile time or run time)?
关联数组的结构是什么样的?
What is the structure of an associative array?
COBOL 记录中的级别编号有何用途?
What is the purpose of level numbers in COBOL records?
定义记录中字段的完全限定和省略引用。
Define fully qualified and elliptical references to fields in records.
记录和元组之间的主要区别是什么?
What is the primary difference between a record and a tuple?
Python 的元组是可变的吗?
Are the tuples of Python mutable?
F# 元组模式的用途是什么?
What is the purpose of an F# tuple pattern?
在哪种主要命令式语言中列表用作数组?
In what primarily imperative language do lists serve as arrays?
Scheme 函数的作用是什么CAR?
What is the action of the Scheme function CAR?
F# 函数的作用是什么tl?
What is the action of the F# function tl?
Scheme 的函数以什么方式CDR修改其参数?
In what way does Scheme’s CDR function modify its parameter?
Python 的列表推导基于什么?
On what are Python’s list comprehensions based?
定义联合、自由联合和歧视联合。
Define union, free union, and discriminated union.
F# 的联合受到歧视吗?
Are the unions of F# discriminated?
指针类型的设计问题是什么?
What are the design issues for pointer types?
指针的两个常见问题是什么?
What are the two common problems with pointers?
为什么大多数语言的指针仅限于指向单一类型变量?
Why are the pointers of most languages restricted to pointing at a single type variable?
什么是 C++ 引用类型,其常见用途是什么?
What is a C++ reference type, and what is its common use?
为什么 C++ 中的引用变量比形式参数的指针更好?
Why are reference variables in C++ better than pointers for formal parameters?
Java 和 C# 引用类型变量与其他语言中的指针相比有哪些优势?
What advantages do Java and C# reference type variables have over the pointers in other languages?
描述回收垃圾的懒惰方法和热切方法。
Describe the lazy and eager approaches to reclaiming garbage.
为什么 Java 和 C# 引用的算术运算没有意义?
Why wouldn’t arithmetic on Java and C# references make sense?
什么是兼容类型?
What is a compatible type?
定义类型错误。
Define type error.
定义强类型。
Define strongly typed.
为什么 Java 不是强类型的?
Why is Java not strongly typed?
什么是非转换类型转换?
What is a nonconverting cast?
哪些语言没有类型强制?
What languages have no type coercions?
为什么 C 和 C++ 不是强类型的?
Why are C and C++ not strongly typed?
什么是名称类型等价?
What is name type equivalence?
什么是结构类型等价?
What is structure type equivalence?
名称类型等价的主要优点是什么?
What is the primary advantage of name type equivalence?
结构类型等价的主要缺点是什么?
What is the primary disadvantage to structure type equivalence?
C 对哪些类型使用结构类型等价?
For what types does C use structure type equivalence?
哪些集合运算模型可以模拟 C 的struct数据类型?
What set operation models C’s struct data type?
支持和反对将布尔值表示为内存中的单个位的论点是什么?
What are the arguments for and against representing Boolean values as single bits in memory?
十进制值如何浪费内存空间?
How does a decimal value waste memory space?
VAX 微型计算机使用的浮点数格式与 IEEE 标准不同。这种格式是什么?为什么 VAX 计算机的设计者会选择这种格式?有关 VAX 浮点表示的参考资料是 Sebesta (1991)。
VAX minicomputers use a format for floating-point numbers that is not the same as the IEEE standard. What is this format, and why was it chosen by the designers of the VAX computers? A reference for VAX floating-point representations is Sebesta (1991).
从安全性和实施成本的角度比较避免悬垂指针的墓碑方法和锁钥匙方法。
Compare the tombstone and lock-and-key methods of avoiding dangling pointers, from the points of view of safety and implementation cost.
隐式取消引用指针有哪些缺点,但仅限于某些情况?
What disadvantages are there in implicit dereferencing of pointers, but only in certain contexts?
解释 Ada 的子类型和派生类型之间的所有差异。
Explain all of the differences between Ada’s subtypes and derived types.
->C 和 C++ 中该运算符有何重要意义?
What significant justification is there for the -> operator in C and C++?
C++ 的枚举类型和 Java 的枚举类型有哪些区别?
What are all of the differences between the enumeration types of C++ and those of Java?
多维数组可以按行主序存储(如 C++ 中),也可以按列主序存储(如 Fortran 中)。为三维数组开发这两种排列方式的访问函数。
Multidimensional arrays can be stored in row major order, as in C++, or in column major order, as in Fortran. Develop the access functions for both of these arrangements for three-dimensional arrays.
在 Burroughs Extended ALGOL 语言中,矩阵存储为指向矩阵行的指针的一维数组,这些指针被视为一维值数组。这种方案的优点和缺点是什么?
In the Burroughs Extended ALGOL language, matrices are stored as a single-dimensioned array of pointers to the rows of the matrix, which are treated as single-dimensioned arrays of values. What are the advantages and disadvantages of such a scheme?
malloc分析并写出 C和free函数与 C++new和运算符的比较delete。比较时以安全性为首要考虑因素。
Analyze and write a comparison of C’s malloc and free functions with C++’s new and delete operators. Use safety as the primary consideration in the comparison.
分析并写出使用 C++ 指针和 Java 引用变量引用固定堆动态变量的比较。比较时以安全性和便利性为主要考虑因素。
Analyze and write a comparison of using C++ pointers and Java reference variables to refer to fixed heap-dynamic variables. Use safety and convenience as the primary considerations in the comparison.
简要讨论一下 Java 设计人员决定不包含 C++ 的指针所带来的损失和带来的好处。
Write a short discussion of what was lost and what was gained in Java’s designers’ decision to not include the pointers of C++.
与 C++ 中要求的显式堆存储恢复相比,Java 隐式堆存储恢复的优缺点是什么?考虑实时系统。
What are the arguments for and against Java’s implicit heap storage recovery, when compared with the explicit heap storage recovery required in C++? Consider real-time systems.
尽管在 Java 的最初几个版本中并没有枚举类型,但在 C# 中包含枚举类型的论据是什么?
What are the arguments for the inclusion of enumeration types in C#, although they were not in the first few versions of Java?
您期望 C# 中指针的使用程度如何?当不是绝对必要时,它们会以何种频率使用?
What would you expect to be the level of use of pointers in C#? How often will they be used when it is not absolutely necessary?
列出两个矩阵应用列表,一个是需要锯齿矩阵的应用,另一个是需要矩形矩阵的应用。现在,讨论编程语言中是否应该只包含锯齿矩阵、矩形矩阵,还是两者都应该包含。
Make two lists of applications of matrices, one for those that require jagged matrices and one for those that require rectangular matrices. Now, argue whether just jagged, just rectangular, or both should be included in a programming language.
比较C++、Java和C#类库的字符串操作能力。
Compare the string manipulation capabilities of the class libraries of C++, Java, and C#.
查找Gehani (1983) 中给出的强类型的定义,并将其与本章给出的定义进行比较。它们有何不同?
Look up the definition of strongly typed as given in Gehani (1983) and compare it with the definition given in this chapter. How do they differ?
静态类型检查比动态类型检查哪些方面更好?
In what way is static type checking better than dynamic type checking?
解释强制规则如何削弱强类型的有益效果。
Explain how coercion rules can weaken the beneficial effect of strong typing.
设计一组简单的测试程序来确定您有权使用的 C 编译器的类型兼容性规则。撰写一份关于您的发现的报告。
Design a set of simple test programs to determine the type compatibility rules of a C compiler to which you have access. Write a report of your findings.
确定您有权访问的某些 C 编译器是否实现了该free功能。
Determine whether some C compiler to which you have access implements the free function.
用某种语言编写一个程序,该程序执行矩阵乘法,该语言可以进行下标范围检查,并且可以从编译器获取该程序的汇编语言或机器语言版本。确定下标范围检查所需的指令数,并将其与矩阵乘法过程的总指令数进行比较。
Write a program that does matrix multiplication in some language that does subscript range checking and for which you can obtain an assembly language or machine language version from the compiler. Determine the number of instructions required for the subscript range checking and compare it with the total number of instructions for the matrix multiplication process.
如果您可以使用编译器,用户可以在其中指定是否需要下标范围检查,请编写一个执行大量矩阵访问的程序并计算其执行时间。运行带有下标范围检查和不带有下标范围检查的程序,并比较时间。
If you have access to a compiler in which the user can specify whether subscript range checking is desired, write a program that does a large number of matrix accesses and time their execution. Run the program with subscript range checking and without it, and compare the times.
用 C++ 编写一个简单的程序来调查其枚举类型的安全性。在枚举类型上至少包含 10 种不同的操作,以确定哪些不正确或愚蠢的事情是合法的。现在,编写一个执行相同操作的 C# 程序并运行它以确定有多少不正确或愚蠢的事情是合法的。比较您的结果。
Write a simple program in C++ to investigate the safety of its enumeration types. Include at least 10 different operations on enumeration types to determine what incorrect or just silly things are legal. Now, write a C# program that does the same things and run it to determine how many of the incorrect or silly things are legal. Compare your results.
用 C++ 或 C# 编写一个程序,其中包含两种不同的枚举类型,并且使用枚举类型进行大量操作。还仅使用整数变量编写相同的程序。比较可读性并预测两个程序之间的可靠性差异。
Write a program in C++ or C# that includes two different enumeration types and has a significant number of operations using the enumeration types. Also write the same program using only integer variables. Compare the readability and predict the reliability differences between the two programs.
编写一个 C 程序,仅使用下标对二维数组的元素进行大量引用。编写第二个程序,执行相同的操作,但使用指针和指针算法对存储映射函数进行数组引用。比较两个程序的时间效率。这两个程序中哪一个可能更可靠?为什么?
Write a C program that does a large number of references to elements of two-dimensioned arrays, using only subscripting. Write a second program that does the same operations but uses pointers and pointer arithmetic for the storage-mapping function to do the array references. Compare the time efficiency of the two programs. Which of the two programs is likely to be more reliable? Why?
编写一个使用哈希和大量哈希操作的 Perl 程序。例如,哈希可以存储人们的姓名和年龄。可以使用随机数生成器来创建三个字符的姓名和年龄,并将其添加到哈希中。当生成重复的姓名时,它将导致访问哈希,但不添加新元素。重写相同的程序而不使用哈希。比较两者的执行效率。比较两者的编程难易程度和可读性。
Write a Perl program that uses a hash and a large number of operations on the hash. For example, the hash could store people’s names and their ages. A random-number generator could be used to create three-character names and ages, which could be added to the hash. When a duplicate name was generated, it would cause an access to the hash but not add a new element. Rewrite the same program without using hashes. Compare the execution efficiency of the two. Compare the ease of programming and readability of the two.
用你选择的语言编写一个程序,如果该语言使用名称等价与使用结构等价,则程序的行为会有所不同。
Write a program in the language of your choice that behaves differently if the language used name equivalence than if it used structural equivalence.
简单赋值语句适用于哪些类型的 A 和 B 在 C++ 中合法但在 Java 中不合法?
For what types of A and B is the simple assignment statement legal in C++ but not Java?
正如标题所示,本章的主题是表达式和赋值语句。首先讨论确定表达式中运算符求值顺序的语义规则。然后解释当函数可能产生副作用时操作数求值顺序的潜在问题。然后讨论预定义和用户定义的重载运算符,以及它们对程序中表达式的影响。接下来,描述和求值混合模式表达式。这导致了隐式和显式加宽和加窄类型转换的定义和求值。然后讨论关系表达式和布尔表达式,包括短路求值过程。最后,介绍赋值语句,从其最简单的形式到其所有变体,包括表达式赋值和混合模式赋值。
As the title indicates, the topic of this chapter is expressions and assignment statements. The semantics rules that determine the order of evaluation of operators in expressions are discussed first. This is followed by an explanation of a potential problem with operand evaluation order when functions can have side effects. Overloaded operators, both predefined and user defined, are then discussed, along with their effects on the expressions in programs. Next, mixed-mode expressions are described and evaluated. This leads to the definition and evaluation of widening and narrowing type conversions, both implicit and explicit. Relational and Boolean expressions are then discussed, including the process of short-circuit evaluation. Finally, the assignment statement, from its simplest form to all of its variations, is covered, including assignments as expressions and mixed-mode assignments.
字符串模式匹配表达式已作为第6章 字符串材料的一部分介绍,因此本章不再提及。
Character string pattern-matching expressions were covered as a part of the material on character strings in Chapter 6, so they are not mentioned in this chapter.
表达式是编程语言中指定计算的基本方法。对于程序员来说,理解其所用语言的表达式的语法和语义至关重要。第3章 介绍了一种描述表达式语法的形式化机制 (BNF) 。本章将讨论表达式的语义。
Expressions are the fundamental means of specifying computations in a programming language. It is crucial for a programmer to understand both the syntax and semantics of expressions of the language he or she uses. A formal mechanism (BNF) for describing the syntax of expressions was introduced in Chapter 3. In this chapter, the semantics of expressions are discussed.
要理解表达式求值,必须熟悉运算符和操作数求值的顺序。表达式的运算符求值顺序由语言的结合性和优先级规则决定。尽管表达式的值有时取决于它,但语言设计者通常不会说明表达式中操作数求值的顺序。这允许实现者选择顺序,从而导致程序在不同的实现中产生不同的结果。表达式语义中的其他问题包括类型不匹配、强制和短路求值。
To understand expression evaluation, it is necessary to be familiar with the orders of operator and operand evaluation. The operator evaluation order of expressions is dictated by the associativity and precedence rules of the language. Although the value of an expression sometimes depends on it, the order of operand evaluation in expressions is often unstated by language designers. This allows implementors to choose the order, which leads to the possibility of programs producing different results in different implementations. Other issues in expression semantics are type mismatches, coercions, and short-circuit evaluation.
命令式编程语言的本质是赋值语句的主导作用。这些语句的目的是引起改变变量值或程序状态的副作用。因此,所有命令式语言的一个组成部分是变量的概念,其值在程序执行期间会发生变化。
The essence of the imperative programming languages is the dominant role of assignment statements. The purpose of these statements is to cause the side effect of changing the values of variables, or the state, of the program. So an integral part of all imperative languages is the concept of variables whose values change during program execution.
函数式语言使用不同类型的变量,例如函数的参数。这些语言还具有将值绑定到名称的声明语句。这些声明类似于赋值语句,但没有副作用。
Functional languages use variables of a different sort, such as the parameters of functions. These languages also have declaration statements that bind values to names. These declarations are similar to assignment statements, but do not have side effects.
自动计算类似于数学、科学和工程中的算术表达式是第一批高级编程语言的主要目标之一。编程语言中算术表达式的大多数特征都继承自数学中发展起来的惯例。在编程语言中,算术表达式由运算符、操作数、括号和函数调用组成。运算符可以是一元的(即它有一个操作数)、二元的(即它有两个操作数)或三元的(即它有三个操作数)。
Automatic evaluation of arithmetic expressions similar to those found in mathematics, science, and engineering was one of the primary goals of the first high-level programming languages. Most of the characteristics of arithmetic expressions in programming languages were inherited from conventions that had evolved in mathematics. In programming languages, arithmetic expressions consist of operators, operands, parentheses, and function calls. An operator can be unary, meaning it has a single operand, binary, meaning it has two operands, or ternary, meaning it has three operands.
在大多数编程语言中,二元运算符都是中缀,这意味着它们出现在操作数之间。Perl 是例外,它有一些运算符是前缀,这意味着它们位于操作数之前。在 Scheme 和 Lisp 中,所有运算符都是 前缀 。大多数一元运算符都是前缀,但基于 C 的语言的++and--运算符可以是前缀或后缀。
In most programming languages, binary operators are infix, which means they appear between their operands. One exception is Perl, which has some operators that are prefix, which means they precede their operands. In Scheme and Lisp, all operators are prefix. Most unary operators are prefix, but the ++ and -- operators of C-based languages can be either prefix or postfix.
算术表达式的目的是指定算术计算。这种计算的实现必须引起两个动作:获取操作数(通常从内存中获取),并对这些操作数执行算术运算。在以下部分中,我们将研究算术表达式的常见设计细节。
The purpose of an arithmetic expression is to specify an arithmetic computation. An implementation of such a computation must cause two actions: fetching the operands, usually from memory, and executing arithmetic operations on those operands. In the following sections, we investigate the common design details of arithmetic expressions.
以下是算术表达式的主要设计问题,本节将对这些问题进行讨论:
Following are the primary design issues for arithmetic expressions, all of which are discussed in this section:
运算符优先级规则是什么?
What are the operator precedence rules?
运算符结合规则是什么?
What are the operator associativity rules?
操作数求值的顺序是什么?
What is the order of operand evaluation?
对于操作数评估的副作用是否有限制?
Are there restrictions on operand evaluation side effects?
该语言是否允许用户定义的运算符重载?
Does the language allow user-defined operator overloading?
表达式中允许哪些类型混合?
What type mixing is allowed in expressions?
语言的运算符优先级和结合性规则规定了其运算符的求值顺序。
The operator precedence and associativity rules of a language dictate the order of evaluation of its operators.
表达式的值至少部分取决于表达式中运算符的求值顺序。考虑以下表达式:
The value of an expression depends at least in part on the order of evaluation of the operators in the expression. Consider the following expression:
a + b * ca + b * c
假设变量a、b和c的值分别为3、4和5。如果从左到右求值(先加法,再乘法),结果为35。如果从右到左求值,结果为23。
Suppose the variables a, b, and c have the values 3, 4, and 5, respectively. If evaluated left to right (the addition first and then the multiplication), the result is 35. If evaluated right to left, the result is 23.
数学家们早就开发出了一种概念,即把运算符置于求值优先级的层次结构中,并根据该层次结构部分确定表达式的求值顺序,而不是简单地从左到右或从右到左求值。例如,在数学中,乘法被认为比加法具有更高的优先级,这可能是因为乘法的复杂程度更高。如果在上一个示例表达式中应用该惯例,就像在大多数编程语言中一样,乘法将首先进行。
Instead of simply evaluating the operators in an expression from left to right or right to left, mathematicians long ago developed the concept of placing operators in a hierarchy of evaluation priorities and basing the evaluation order of expressions partly on this hierarchy. For example, in mathematics, multiplication is considered to be of higher priority than addition, perhaps due to its higher level of complexity. If that convention were applied in the previous example expression, as would be the case in most programming languages, the multiplication would be done first.
表达式求值的运算符优先级规则部分定义了不同优先级的运算符的求值顺序。表达式的运算符优先级规则基于语言设计者所看到的运算符优先级层次结构。常见命令式语言的运算符优先级规则几乎都相同,因为它们基于数学的运算符优先级规则。在这些语言中,指数运算具有最高优先级(当语言提供时),其次是同一级别的乘法和除法,然后是同一级别的二进制加法和减法。
The operator precedence rules for expression evaluation partially define the order in which the operators of different precedence levels are evaluated. The operator precedence rules for expressions are based on the hierarchy of operator priorities, as seen by the language designer. The operator precedence rules of the common imperative languages are nearly all the same, because they are based on those of mathematics. In these languages, exponentiation has the highest precedence (when it is provided by the language), followed by multiplication and division on the same level, followed by binary addition and subtraction on the same level.
许多语言还包括一元加法和减法。一元加法被称为恒等运算符,因为它通常没有相关运算,因此对其操作数没有影响。Ellis和 Stroustrup(1990 年short,第 56 页)在谈到 C++ 时称其为历史偶然,并正确地将其标记为无用。一元减法当然会改变其操作数的符号。在 Java 和 C# 中,一元减法还会导致和操作数隐式转换byte为int类型。
Many languages also include unary versions of addition and subtraction. Unary addition is called the identity operator because it usually has no associated operation and thus has no effect on its operand. Ellis and Stroustrup (1990, p. 56), speaking about C++, call it a historical accident and correctly label it useless. Unary minus, of course, changes the sign of its operand. In Java and C#, unary minus also causes the implicit conversion of short and byte operands to int type.
在所有常见的命令式语言中,一元减运算符可以出现在表达式的开头或表达式内部的任何位置,只要用括号括起来以防止它位于另一个运算符旁边即可。例如,
In all of the common imperative languages, the unary minus operator can appear in an expression either at the beginning or anywhere inside the expression, as long as it is parenthesized to prevent it from being next to another operator. For example,
a + (- b) * ca + (- b) * c
是合法的,但是
is legal, but
a + - b * ca + - b * c
通常不是。
usually is not.
接下来,考虑以下表达式:
Next, consider the following expressions:
- a / b
- a * b
- a ** b
- a / b
- a * b
- a ** b
在前两种情况下,一元减运算符和二元运算符的相对优先级无关紧要——两个运算符的求值顺序对表达式的值没有影响。但在最后一种情况下,这很重要。
In the first two cases, the relative precedence of the unary minus operator and the binary operator is irrelevant—the order of evaluation of the two operators has no effect on the value of the expression. In the last case, however, it does matter.
在常见的编程语言中,只有 Fortran、Ruby、Visual Basic 和 Ada 具有幂运算符。在这四种语言中,幂的优先级都高于一元减法,因此
Of the common programming languages, only Fortran, Ruby, Visual Basic, and Ada have the exponentiation operator. In all four, exponentiation has higher precedence than unary minus, so
- A ** B- A ** B
相当于
is equivalent to
-(A ** B)-(A ** B)
Ruby 和 C 语言的算术运算符的优先级如下:
The precedences of the arithmetic operators of Ruby and the C-based languages are as follows:
该**运算符为幂运算。% 运算符接受两个整数操作数,并在除以第二个操作数后得出第一个操作数的余数。1基于 C 的语言的++和 -- 运算符在7.7.4节 中描述。
The ** operator is exponentiation. The % operator takes two integer operands and yields the remainder of the first after division by the second.1 The ++ and -- operators of the C-based languages are described in Section 7.7.4.
APL 在语言中比较特殊,因为它只有一个优先级,如下一节所示。
APL is odd among languages because it has a single level of precedence, as illustrated in the next section.
优先级仅解释运算符评估顺序的部分规则;结合性规则也会影响它。
Precedence accounts for only some of the rules for the order of operator evaluation; associativity rules also affect it.
考虑以下表达式:
Consider the following expression:
a - b + c - da - b + c - d
如果加法和减法运算符具有相同的优先级(就像在编程语言中一样),则优先级规则不会说明此表达式中运算符的求值顺序。
If the addition and subtraction operators have the same level of precedence, as they do in programming languages, the precedence rules say nothing about the order of evaluation of the operators in this expression.
当表达式包含两个相邻的具有相同优先级的运算符时,哪个运算符首先被求值的问题由语言的结合性规则来回答。运算符可以具有左结合性或右结合性,这意味着当有两个具有相同优先级的相邻运算符时,分别首先求值左侧运算符或首先求值右侧运算符。
When an expression contains two adjacent2 occurrences of operators with the same level of precedence, the question of which operator is evaluated first is answered by the associativity rules of the language. An operator can have either left or right associativity, meaning that when there are two adjacent operators with the same precedence, the left operator is evaluated first or the right operator is evaluated first, respectively.
常见语言中的结合律是从左到右,但幂运算符(如果提供)有时是从右到左结合。在 Java 表达式中
Associativity in common languages is left to right, except that the exponentiation operator (when provided) sometimes associates right to left. In the Java expression
a - b + ca - b + c
首先评估左边的运算符。
the left operator is evaluated first.
Fortran 和 Ruby 中的指数运算是右结合的,因此在表达式中
Exponentiation in Fortran and Ruby is right associative, so in the expression
A ** B ** CA ** B ** C
首先评估正确的运算符。
the right operator is evaluated first.
在 Visual Basic 中,幂运算符^是左结合的。
In Visual Basic, the exponentiation operator, ^, is left associative.
这里给出了一些常见语言的结合规则:
The associativity rules for a few common languages are given here:
如第7.2.1.1节 所述,在 APL 中,所有运算符都具有相同的优先级。因此,APL 表达式中运算符的求值顺序完全由结合性规则决定,对于所有运算符,该规则都是从右到左。例如,在表达式中
As stated in Section 7.2.1.1, in APL, all operators have the same level of precedence. Thus, the order of evaluation of operators in APL expressions is determined entirely by the associativity rule, which is right to left for all operators. For example, in the expression
A × B + CA × B + C
首先求值加法运算符,然后求值乘法运算符(× 是 APL 乘法运算符)。如果A是3,B是4,C是5,那么这个 APL 表达式的值就是27。
the addition operator is evaluated first, followed by the multiplication operator (× is the APL multiplication operator). If A were 3, B were 4, and C were 5, then the value of this APL expression would be 27.
许多通用语言的编译器都利用了某些算术运算符具有数学结合性这一事实,这意味着结合性规则对仅包含这些运算符的表达式的值没有影响。例如,加法具有数学结合性,因此在数学中,表达式的值
Many compilers for the common languages make use of the fact that some arithmetic operators are mathematically associative, meaning that the associativity rules have no impact on the value of an expression containing only those operators. For example, addition is mathematically associative, so in mathematics the value of the expression
A + B + CA + B + C
不依赖于运算符求值的顺序。如果数学上具有结合性的浮点运算也是结合性的,则编译器可以利用这一事实执行一些简单的优化。具体而言,如果允许编译器对运算符求值进行重新排序,则它可能能够生成稍快的表达式求值代码。编译器通常会进行此类优化。
does not depend on the order of operator evaluation. If floating-point operations for mathematically associative operations were also associative, the compiler could use this fact to perform some simple optimizations. Specifically, if the compiler is allowed to reorder the evaluation of operators, it may be able to produce slightly faster code for expression evaluation. Compilers commonly do these kinds of optimizations.
不幸的是,在计算机中,浮点表示和浮点算术运算都只是其数学对应物的近似值(因为大小限制)。数学运算符具有结合性并不一定意味着相应的计算机浮点运算具有结合性。事实上,只有当所有操作数和中间结果都可以用浮点表示法精确表示时,该过程才会具有精确的结合性。例如,存在病态的计算机上整数加法不具有结合性的情况。例如,假设程序必须计算表达式
Unfortunately, in a computer, both floating-point representations and floating-point arithmetic operations are only approximations of their mathematical counterparts (because of size limitations). The fact that a mathematical operator is associative does not necessarily imply that the corresponding computer floating-point operation is associative. In fact, only if all the operands and intermediate results can be exactly represented in floating-point notation will the process be precisely associative. For example, there are pathological situations in which integer addition on a computer is not associative. For example, suppose that a program must evaluate the expression
A + B + C + DA + B + C + D
并且A和C都是很大的正数,而B和D都是绝对值很大的负数。在这种情况下,添加B到A不会导致溢出异常,但添加C到A会。同样,添加C到B不会导致溢出,但添加D到B会。由于计算机算术的局限性,在这种情况下加法是灾难性的非结合性的。因此,如果编译器重新排序这些加法运算,就会影响表达式的值。当然,假设变量的近似值是已知的,程序员可以避免这个问题。程序员可以将表达式分为两部分(两个赋值语句),确保避免溢出。然而,这种情况可能以更微妙的方式出现,程序员不太可能注意到顺序依赖性。
and that A and C are very large positive numbers, and B and D are negative numbers with very large absolute values. In this situation, adding B to A does not cause an overflow exception, but adding C to A does. Likewise, adding C to B does not cause overflow, but adding D to B does. Because of the limitations of computer arithmetic, addition is catastrophically nonassociative in this case. Therefore, if the compiler reorders these addition operations, it affects the value of the expression. This problem, of course, can be avoided by the programmer, assuming the approximate values of the variables are known. The programmer can specify the expression in two parts (in two assignment statements), ensuring that overflow is avoided. However, this situation can arise in far more subtle ways, in which the programmer is less likely to notice the order dependence.
程序员可以通过在表达式中放置括号来改变优先级和结合性规则。表达式中带括号的部分优先于其相邻的未带括号的部分。例如,虽然乘法优先于加法,但在表达式中
Programmers can alter the precedence and associativity rules by placing parentheses in expressions. A parenthesized part of an expression has precedence over its adjacent unparenthesized parts. For example, although multiplication has precedence over addition, in the expression
(A + B) * C(A + B) * C
首先计算加法。从数学上讲,这是完全自然的。在此表达式中,乘法运算符的第一个操作数直到计算括号内的子表达式中的加法后才可用。此外,第7.2.1.2节 中的表达式可以指定为
the addition will be evaluated first. Mathematically, this is perfectly natural. In this expression, the first operand of the multiplication operator is not available until the addition in the parenthesized subexpression is evaluated. Also, the expression from Section 7.2.1.2 could be specified as
(A + B) + (C + D)(A + B) + (C + D)
以避免溢出。
to avoid overflow.
允许在算术表达式中使用括号的语言可以省去所有优先级规则,只需将所有运算符从左到右或从右到左关联即可。程序员可以使用括号指定所需的求值顺序。这种方法很简单,因为程序的作者和读者都不需要记住任何优先级或结合性规则。这种方案的缺点是它使表达式的编写更加繁琐,并且严重损害了代码的可读性。然而,这是 APL 的设计者 Ken Iverson 做出的选择。
Languages that allow parentheses in arithmetic expressions could dispense with all precedence rules and simply associate all operators left to right or right to left. The programmer would specify the desired order of evaluation with parentheses. This approach would be simple because neither the author nor the readers of programs would need to remember any precedence or associativity rules. The disadvantage of this scheme is that it makes writing expressions more tedious, and it also seriously compromises the readability of the code. Yet this was the choice made by Ken Iverson, the designer of APL.
回想一下,Ruby 是一种纯粹的面向对象语言,这意味着,除其他外,每个数据值(包括文字)都是一个对象。Ruby 支持 C 语言中包含的算术和逻辑运算集合语言。在表达式方面,Ruby 与基于 C 的语言的区别在于,所有算术、关系和赋值运算符以及数组索引、移位和按位逻辑运算符都是作为方法实现的。例如,表达式a + b是对
引用的对象的方法a,将引用的对象b作为参数传递。
Recall that Ruby is a pure object-oriented language, which means, among other things, that every data value, including literals, is an object. Ruby supports the collection of arithmetic and logic operations that are included in the C-based languages. What sets Ruby apart from the C-based languages in the area of expressions is that all of the arithmetic, relational, and assignment operators, as well as array indexing, shifts, and bitwise logic operators, are implemented as methods. For example, the expression a + b is a call to the
method of the object referenced by a, passing the object referenced by b as a parameter.
将运算符实现为方法的一个有趣结果是,它们可以被应用程序重写。因此,这些运算符可以被重新定义。虽然重新定义预定义类型的运算符通常没有什么用处,但正如我们将在7.3节 中看到的那样,为用户定义类型定义预定义运算符很有用,这在某些语言中可以通过运算符重载来实现。
One interesting result of the implementation of operators as methods is that they can be overridden by application programs. Therefore, these operators can be redefined. While it is often not useful to redefine operators for predefined types, it is useful, as we will see in Section 7.3, to define predefined operators for user-defined types, which can be done with operator overloading in some languages.
在 C++ 和 Ada 中,运算符实际上是作为函数调用来实现的。
In C++ and Ada, operators are actually implemented as function calls.
与 Ruby 一样,Lisp 中的所有算术和逻辑运算都由子程序执行。但在 Lisp 中,必须显式调用子程序。例如,要在a + b * cLisp 中指定 C 表达式,必须编写以下表达式:3
As is the case with Ruby, all arithmetic and logic operations in Lisp are performed by subprograms. But in Lisp, the subprograms must be explicitly called. For example, to specify the C expression a + b * c in Lisp, one must write the following expression:3
(+ a (* b c))(+ a (* b c))
在这个表达式中,+和*是函数的名称。
In this expression, + and * are the names of functions.
if-then-else语句可用于执行条件表达式赋值。例如,考虑
if-then-else statements can be used to perform a conditional expression assignment. For example, consider
if (count == 0)
average = 0;
else
average = sum / count;
if (count == 0)
average = 0;
else
average = sum / count;
在基于 C 的语言中,可以使用条件表达式在赋值语句中更方便地指定此代码,其形式如下:
In the C-based languages, this code can be specified more conveniently in an assignment statement using a conditional expression, which has the following form:
表达式 1 ? 表达式 2 : 表达式 3
expression_1 ? expression_2 : expression_3
其中,expression_1 被解释为布尔表达式。如果expression_1 求值为真,则整个表达式的值是expression_2 的值;否则,整个表达式的值是expression_3 的值。例如,if-then-else可以使用以下赋值语句实现示例的效果,其中使用条件表达式:
where expression_1 is interpreted as a Boolean expression. If expression_1 evaluates to true, the value of the whole expression is the value of expression_2; otherwise, it is the value of expression_3. For example, the effect of the example if-then-else can be achieved with the following assignment statement, using a conditional expression:
average = (count == 0) ? 0 : sum / count;average = (count == 0) ? 0 : sum / count;
实际上,问号表示then子句的开始,冒号表示else子句的开始。两个子句都是强制性的。请注意,?在条件表达式中用作三元运算符。
In effect, the question mark denotes the beginning of the then clause, and the colon marks the beginning of the else clause. Both clauses are mandatory. Note that ? is used in conditional expressions as a ternary operator.
条件表达式可以在程序(基于 C 的语言)中任何可以使用其他表达式的地方使用。除了基于 C 的语言之外,Perl、JavaScript 和 Ruby 也提供条件表达式。
Conditional expressions can be used anywhere in a program (in a C-based language) where any other expression can be used. In addition to the C-based languages, conditional expressions are provided in Perl, JavaScript, and Ruby.
表达式的一个不太常被讨论的设计特征是操作数的求值顺序。表达式中的变量通过从内存中获取它们的值来求值。常量有时也以同样的方式求值。在其他情况下,常量可能是机器语言指令的一部分,不需要内存提取。如果操作数是带括号的表达式,则必须先求值它包含的所有运算符,然后才能将其值用作操作数。
A less commonly discussed design characteristic of expressions is the order of evaluation of operands. Variables in expressions are evaluated by fetching their values from memory. Constants are sometimes evaluated the same way. In other cases, a constant may be part of the machine language instruction and not require a memory fetch. If an operand is a parenthesized expression, all of the operators it contains must be evaluated before its value can be used as an operand.
如果运算符的两个操作数都没有副作用,则操作数求值顺序无关紧要。因此,唯一有趣的情况是操作数的求值确实有副作用。
If neither of the operands of an operator has side effects, then operand evaluation order is irrelevant. Therefore, the only interesting case arises when the evaluation of an operand does have side effects.
函数的副作用,自然称为功能副作用,是指函数改变其参数之一或全局变量时发生的。(全局变量在函数外部声明,但可在函数内访问。)
A side effect of a function, naturally called a functional side effect, occurs when the function changes either one of its parameters or a global variable. (A global variable is declared outside the function but is accessible in the function.)
考虑以下表达式:
Consider the following expression:
a + fun(a)a + fun(a)
如果fun没有改变 的副作用,那么两个操作数和a的求值顺序对表达式的值没有影响。但是,如果改变,则会产生影响。考虑以下情况:返回并将其参数的值更改为。假设我们有以下内容:afun(a)funafun1020
If fun does not have the side effect of changing a, then the order of evaluation of the two operands, a and fun(a), has no effect on the value of the expression. However, if fun changes a, there is an effect. Consider the following situation: fun returns 10 and changes the value of its parameter to 20. Suppose we have the following:
a = 10;
b = a + fun(a);
a = 10;
b = a + fun(a);
然后,如果首先获取 的值a(在表达式求值过程中),10则其值为 ,表达式的值为20。但如果首先求值第二个操作数,则第一个操作数的值为20,表达式的值为30。
Then, if the value of a is fetched first (in the expression evaluation process), its value is 10 and the value of the expression is 20. But if the second operand is evaluated first, then the value of the first operand is 20 and the value of the expression is 30.
以下 C 程序说明了当函数更改表达式中出现的全局变量时出现的相同问题:
The following C program illustrates the same problem when a function changes a global variable that appears in an expression:
int a = 5;
int fun1() {
a = 17;
return 3;
} /* end of fun1 */
void main() {
a = a + fun1();
} /* end of main */
int a = 5;
int fun1() {
a = 17;
return 3;
} /* end of fun1 */
void main() {
a = a + fun1();
} /* end of main */
a中的计算值main取决于表达式中操作数的求值顺序a + fun1()。 的值a将是8(如果a首先求值)或20(如果首先求值函数调用)。
The value computed for a in main depends on the order of evaluation of the operands in the expression a + fun1(). The value of a will be either 8 (if a is evaluated first) or 20 (if the function call is evaluated first).
请注意,数学中的函数没有副作用,因为数学中没有变量的概念。函数式编程语言也是如此。在数学和函数式编程语言中,函数比命令式语言中的函数更容易推理和理解,因为它们的上下文与其含义无关。
Note that functions in mathematics do not have side effects, because there is no notion of variables in mathematics. The same is true for functional programming languages. In both mathematics and functional programming languages, functions are much easier to reason about and understand than those in imperative languages, because their context is irrelevant to their meaning.
操作数求值顺序和副作用问题有两种可能的解决方案。首先,语言设计者可以通过简单地禁止函数副作用来禁止函数求值影响表达式的值。其次,语言定义可以规定表达式中的操作数应按特定顺序求值,并要求实现者保证该顺序。
There are two possible solutions to the problem of operand evaluation order and side effects. First, the language designer could disallow function evaluation from affecting the value of expressions by simply disallowing functional side effects. Second, the language definition could state that operands in expressions are to be evaluated in a particular order and demand that implementors guarantee that order.
在命令式语言中禁止函数副作用是困难的,而且它消除了程序员的一些灵活性。考虑 C 和 C++ 的情况,它们只有函数,这意味着所有子程序都返回一个值。为了消除双向参数的副作用并仍然提供返回多个值的子程序,需要将值放在结构中并返回结构。还必须禁止在函数中访问全局变量。但是,当效率很重要时,使用对全局变量的访问来避免参数传递是提高执行速度的重要方法。例如,在编译器中,对符号表等数据的全局访问很常见。
Disallowing functional side effects in the imperative languages is difficult, and it eliminates some flexibility for the programmer. Consider the case of C and C++, which have only functions, meaning that all subprograms return one value. To eliminate the side effects of two-way parameters and still provide subprograms that return more than one value, the values would need to be placed in a struct and the struct returned. Access to globals in functions would also have to be disallowed. However, when efficiency is important, using access to global variables to avoid parameter passing is an important method of increasing execution speed. In compilers, for example, global access to data such as the symbol table is commonplace.
严格求值顺序的问题在于,编译器使用的一些代码优化技术涉及对操作数求值进行重新排序。当涉及函数调用时,保证顺序不允许这些优化方法。因此,没有完美的解决方案,实际的语言设计已经证实了这一点。
The problem with having a strict evaluation order is that some code optimization techniques used by compilers involve reordering operand evaluations. A guaranteed order disallows those optimization methods when function calls are involved. There is, therefore, no perfect solution, as is borne out by actual language designs.
Java 语言定义保证操作数按从左到右的顺序进行评估,从而消除了本节讨论的问题。
The Java language definition guarantees that operands appear to be evaluated in left-to-right order, eliminating the problem discussed in this section.
引用透明性的概念与功能副作用相关,并受其影响。程序具有引用透明性。引用透明函数的值完全取决于其参数。4引用透明性和功能副作用的联系可以通过下面的例子来说明:
The concept of referential transparency is related to and affected by functional side effects. A program has the property of referential transparency if any two expressions in the program that have the same value can be substituted for one another anywhere in the program, without affecting the action of the program. The value of a referentially transparent function depends entirely on its parameters.4 The connection of referential transparency and functional side effects is illustrated by the following example:
result1 = (fun(a) + b) / (fun(a) - c);
temp = fun(a);
result2 = (temp + b) / (temp - c);
result1 = (fun(a) + b) / (fun(a) - c);
temp = fun(a);
result2 = (temp + b) / (temp - c);
如果函数fun没有副作用,result1和result2将相等,因为分配给它们的表达式是等效的。但是,假设有将或fun加 1 的副作用。那么就不等于。因此,该副作用违反了代码所在程序的引用透明性。bcresult1result2
If the function fun has no side effects, result1 and result2 will be equal, because the expressions assigned to them are equivalent. However, suppose fun has the side effect of adding 1 to either b or c. Then result1 would not be equal to result2. So, that side effect violates the referential transparency of the program in which the code appears.
引用透明程序有几个优点。其中最重要的是,这种程序的语义比非引用透明程序的语义更容易理解。从易于理解的角度来看,引用透明使函数等同于数学函数。
There are several advantages to referentially transparent programs. The most important of these is that the semantics of such programs is much easier to understand than the semantics of programs that are not referentially transparent. Being referentially transparent makes a function equivalent to a mathematical function, in terms of ease of understanding.
由于没有变量,用纯函数式语言编写的程序是引用透明的。纯函数式语言中的函数不能具有状态,状态将存储在局部变量中。如果此类函数使用函数外部的值,则该值必须是常量,因为没有变量。因此,函数的值取决于其参数的值。
Because they do not have variables, programs written in pure functional languages are referentially transparent. Functions in a pure functional language cannot have state, which would be stored in local variables. If such a function uses a value from outside the function, that value must be a constant, since there are no variables. Therefore, the value of the function depends on the values of its parameters.
引用透明度将在第 15章 中进一步讨论。
Referential transparency will be further discussed in Chapter 15.
算术运算符通常用于多种用途。例如,+通常用于指定整数加法和浮点加法。某些语言(例如 Java)也将其用于字符串连接。运算符的这种多次使用称为运算符重载,通常被认为是可以接受的,只要不影响可读性和可靠性即可。
Arithmetic operators are often used for more than one purpose. For example, + usually is used to specify integer addition and floating-point addition. Some languages—Java, for example—also use it for string catenation. This multiple use of an operator is called operator overloading and is generally thought to be acceptable, as long as neither readability nor reliability suffers.
作为重载可能带来的危险的一个例子,请考虑在 C++ 中使用与号 ( &)。作为二元运算符,它指定按位逻辑 AND 运算。但是,作为一元运算符,其含义完全不同。作为以变量为操作数的一元运算符,表达式值是该变量的地址。在这种情况下,与号称为地址运算符。例如,执行
As an example of the possible dangers of overloading, consider the use of the ampersand (&) in C++. As a binary operator, it specifies a bitwise logical AND operation. As a unary operator, however, its meaning is totally different. As a unary operator with a variable as its operand, the expression value is the address of that variable. In this case, the ampersand is called the address-of operator. For example, the execution of
x = &y;x = &y;
导致将的地址y放入x。多次使用“与”符号有两个问题。首先,对两个完全不相关的操作使用相同的符号不利于可读性。其次,编译器可能无法检测到简单的键入错误,即遗漏按位“与”操作的第一个操作数,因为它被解释为地址运算符。这种错误可能很难诊断。
causes the address of y to be placed in x. There are two problems with this multiple use of the ampersand. First, using the same symbol for two completely unrelated operations is detrimental to readability. Second, the simple keying error of leaving out the first operand for a bitwise AND operation can go undetected by the compiler, because it is interpreted as an address-of operator. Such an error may be difficult to diagnose.
几乎所有编程语言都存在一个不太严重但类似的问题,这通常是由于减号运算符的重载造成的。问题只是编译器无法判断该运算符是二元运算符还是一元运算符。5因此,当运算符是二元运算符时,如果未包含第一个操作数,编译器无法将其检测为错误。但是,一元运算符和二元运算符的含义至少是密切相关的,因此可读性不会受到不利影响。
Virtually all programming languages have a less serious but similar problem, which is often due to the overloading of the minus operator. The problem is only that the compiler cannot tell if the operator is meant to be binary or unary.5 So once again, failure to include the first operand when the operator is meant to be binary cannot be detected as an error by the compiler. However, the meanings of the two operations, unary and binary, are at least closely related, so readability is not adversely affected.
某些支持抽象数据类型的语言(参见第11章 ),例如 C++、C# 和 F#,允许程序员进一步重载运算符符号。例如,假设用户想要定义*标量整数和整数数组之间的运算符,表示数组的每个元素都要乘以标量。可以通过编写*执行此新运算的名为 的函数子程序来定义这样的运算符。当指定重载运算符时,编译器将根据操作数的类型选择正确的含义,就像语言定义的重载运算符一样。例如,如果在 C# 程序中定义了这个新的 定义,则每当运算符以简单整数作为左操作数并以整数数组作为右操作数出现时,*C# 编译器就会使用 的新定义。**
Some languages that support abstract data types (see Chapter 11), for example, C++, C#, and F#, allow the programmer to further overload operator symbols. For instance, suppose a user wants to define the * operator between a scalar integer and an integer array to mean that each element of the array is to be multiplied by the scalar. Such an operator could be defined by writing a function subprogram named * that performs this new operation. The compiler will choose the correct meaning when an overloaded operator is specified, based on the types of the operands, as with language-defined overloaded operators. For example, if this new definition for * is defined in a C# program, a C# compiler will use the new definition for * whenever the * operator appears with a simple integer as the left operand and an integer array as the right operand.
如果使用得当,用户定义的运算符重载可以提高可读性。例如,如果+和*为矩阵抽象数据类型重载,并且A、、B和C是D该类型的变量,则
When sensibly used, user-defined operator overloading can aid readability. For example, if + and * are overloaded for a matrix abstract data type and A, B, C, and D are variables of that type, then
A * B + C * DA * B + C * D
可以代替
can be used instead of
MatrixAdd(MatrixMult(A, B), MatrixMult(C, D))MatrixAdd(MatrixMult(A, B), MatrixMult(C, D))
另一方面,用户定义的重载可能会损害可读性。首先,没有什么可以阻止用户将其定义+为乘法。此外,*在程序中看到运算符时,读者必须找到操作数的类型和运算符的定义才能确定其含义。这些定义中的任何一个或全部都可能位于其他文件中。
On the other hand, user-defined overloading can be harmful to readability. For one thing, nothing prevents a user from defining + to mean multiplication. Furthermore, seeing an * operator in a program, the reader must find both the types of the operands and the definition of the operator to determine its meaning. Any or all of these definitions could be in other files.
另一个考虑因素是使用不同团队创建的模块构建软件系统的过程。如果不同的团队以不同的方式重载相同的运算符,则显然需要在将系统组合在一起之前消除这些差异。
Another consideration is the process of building a software system from modules created by different groups. If the different groups overloaded the same operators in different ways, these differences would obviously need to be eliminated before putting the system together.
C++ 有一些无法重载的运算符。其中包括类或结构成员运算符 (.) 和范围解析运算符 (::)。有趣的是,运算符重载是未复制到 Java 中的 C++ 功能之一。然而,它确实在 C# 中重新出现。
C++ has a few operators that cannot be overloaded. Among these are the class or structure member operator (.) and the scope resolution operator (::). Interestingly, operator overloading was one of the C++ features that was not copied into Java. However, it did reappear in C#.
The implementation of user-defined operator overloading is discussed in Chapter 9.
类型转换要么是缩小,要么是扩大。缩小转换将一个值转换为一种类型,该类型无法存储原始类型的所有值的近似值。例如,在 Java 中将 转换为 就是缩小转换,因为 的范围比 的范围大得多。扩大转换double将一个值转换为一种类型,该类型至少可以包含原始类型的所有值的近似值。例如,在 Java 中将 转换为就是扩大转换。扩大转换几乎总是安全的,这意味着转换后的值的近似值保持不变。缩小转换并非总是安全的——有时转换后值的大小会在此过程中发生变化。例如,如果在 Java 程序中将浮点值 1.3E25 转换为整数,则结果与原始值不会有任何关联。floatdoublefloatintfloat
Type conversions are either narrowing or widening. A narrowing conversion converts a value to a type that cannot store even approximations of all of the values of the original type. For example, converting a double to a float in Java is a narrowing conversion, because the range of double is much larger than that of float. A widening conversion converts a value to a type that can include at least approximations of all of the values of the original type. For example, converting an int to a float in Java is a widening conversion. Widening conversions are nearly always safe, meaning that the approximate magnitude of the converted value is maintained. Narrowing conversions are not always safe—sometimes the magnitude of the converted value is changed in the process. For example, if the floating-point value 1.3E25 is converted to an integer in a Java program, the result will not be in any way related to the original value.
尽管扩展转换通常是安全的,但它们可能会导致精度降低。在许多语言实现中,尽管整数到浮点的转换是扩展转换,但可能会丢失一些精度。例如,在许多情况下,整数以 32 位存储,这允许至少 9 位小数的精度。但浮点值也以 32 位存储,精度只有大约 7 位小数(因为指数使用了空间)。因此,整数到浮点的扩展可能会导致两位精度的损失。
Although widening conversions are usually safe, they can result in reduced accuracy. In many language implementations, although integer-to-floating-point conversions are widening conversions, some precision may be lost. For example, in many cases, integers are stored in 32 bits, which allows at least 9 decimal digits of precision. But floating-point values are also stored in 32 bits, with only about seven decimal digits of precision (because of the space used for the exponent). So, integer-to-floating-point widening can result in the loss of two digits of precision.
当然,非原始类型的强制转换更为复杂。第5章讨论了数组和记录类型的赋值兼容性的复杂性。还有一个问题,即方法的哪些参数类型和返回类型允许它覆盖超类中的方法 — 仅当类型相同时,或者也包括其他一些情况。第 12章 讨论了这个问题以及子类作为子类型的概念。
Coercions of nonprimitive types are, of course, more complex. In Chapter 5, the complications of assignment compatibility of array and record types were discussed. There is also the question of what parameter types and return types of a method allow it to override a method in a superclass—only when the types are the same, or also some other situations. That issue, as well as the concept of subclasses as subtypes, are discussed in Chapter 12.
类型转换可以是显式的,也可以是隐式的。以下两小节讨论了这两种类型转换。
Type conversions can be either explicit or implicit. The following two subsections discuss these kinds of type conversions.
关于算术表达式的设计决策之一是运算符是否可以有不同类型的操作数。允许此类表达式的语言(称为混合模式表达式)必须定义隐式操作数类型转换的约定,因为计算机没有采用不同类型的操作数的二元运算。回想一下,在第5章 中,强制被定义为由编译器或运行时系统启动的隐式类型转换。程序员明确请求的类型转换称为显式转换或强制转换,而不是强制转换。
One of the design decisions concerning arithmetic expressions is whether an operator can have operands of different types. Languages that allow such expressions, which are called mixed-mode expressions, must define conventions for implicit operand type conversions because computers do not have binary operations that take operands of different types. Recall that in Chapter 5, coercion was defined as an implicit type conversion that is initiated by the compiler or runtime system. Type conversions explicitly requested by the programmer are referred to as explicit conversions, or casts, not coercions.
尽管某些运算符符号可能会被重载,但我们假设计算机系统(无论是硬件还是某种程度的软件模拟)都为语言中定义的每种操作数类型和运算符提供了一种操作。6对于使用静态类型绑定的语言中的重载运算符,编译器会根据操作数的类型选择正确的操作类型。当运算符的两个操作数不是同一类型且在语言中是合法的时,编译器必须选择其中一个进行强制转换并生成该强制转换的代码。在以下讨论中,我们将研究几种常见语言的强制转换设计选择。
Although some operator symbols may be overloaded, we assume that a computer system, either in hardware or in some level of software simulation, has an operation for each operand type and operator defined in the language.6 For overloaded operators in a language that uses static type binding, the compiler chooses the correct type of operation on the basis of the types of the operands. When the two operands of an operator are not of the same type and that is legal in the language, the compiler must choose one of them to be coerced and generate the code for that coercion. In the following discussion, the coercion design choices of several common languages are examined.
语言设计者对算术表达式中的强制转换问题意见不一。反对广泛使用强制转换的人担心这种强制转换可能导致的可靠性问题,因为它们会降低类型检查的好处。而希望使用广泛使用强制转换的人则更担心限制会导致灵活性的丧失。问题在于程序员是否应该关注这类错误,或者编译器是否应该检测它们。
Language designers are not in agreement on the issue of coercions in arithmetic expressions. Those against a broad range of coercions are concerned with the reliability problems that can result from such coercions, because they reduce the benefits of type checking. Those who would rather include a wide range of coercions are more concerned with the loss in flexibility that results from restrictions. The issue is whether programmers should need to be concerned with this category of errors or whether the compiler should detect them.
为了简单说明该问题,请考虑以下 Java 代码:
As a simple illustration of the problem, consider the following Java code:
int a;
float b, c, d;
. . .
d = b * a;
int a;
float b, c, d;
. . .
d = b * a;
假设乘法运算符的第二个操作数应该是c,但由于键入错误,它被输入为。由于混合模式表达式在 Java 中是合法的,因此编译器不会将其检测为错误。它只会插入代码以将操作数a的值强制转换为。如果混合模式表达式在 Java 中不合法,则编译器会将此键入错误检测为类型错误。intafloat
Assume that the second operand of the multiplication operator was supposed to be c, but because of a keying error it was typed as a. Because mixed-mode expressions are legal in Java, the compiler would not detect this as an error. It would simply insert code to coerce the value of the int operand, a, to float. If mixed-mode expressions were not legal in Java, this keying error would have been detected by the compiler as a type error.
由于允许使用混合模式表达式会降低错误检测能力,因此 F#、Ada 和 ML 不允许使用混合模式表达式。例如,它们不允许在表达式中混合整数和浮点操作数。
Because error detection is reduced when mixed-mode expressions are allowed, F#, Ada, and ML do not allow them. For example, they do not allow mixing of integer and floating-point operands in expressions.
在大多数其他常见语言中,对混合模式算术表达式没有任何限制。
In most of the other common languages, there are no restrictions on mixed-mode arithmetic expressions.
基于 C 的语言具有比int类型小的整数类型。在 Java 中,这些是byte和short。int几乎任何运算符应用于这些类型的操作数时,它们都会被强制转换为。因此,虽然数据可以存储在这些类型的变量中,但在转换为更大的类型之前无法对其进行操作。例如,请考虑以下 Java 代码:
The C-based languages have integer types that are smaller than the int type. In Java, these are byte and short. Operands of all of these types are coerced to int whenever virtually any operator is applied to them. So, while data can be stored in variables of these types, it cannot be manipulated before conversion to a larger type. For example, consider the following Java code:
byte a, b, c;
. . .
a = b + c;byte a, b, c;
. . .
a = b + c;
作为过度强制转换的危险和代价的一个更极端的例子,请考虑 PL/I 为实现表达式灵活性所做的努力。在 PL/I 中,字符串变量可以作为算术运算符的操作数,而整数可以作为另一个操作数。在运行时,将扫描字符串以查找数值。如果该值恰好包含小数点,则假定该值为浮点类型,另一个操作数被强制转换为浮点,并且结果运算为浮点。这种强制转换策略非常昂贵,因为类型检查和转换都必须在运行时完成。它还消除了检测表达式中程序员错误的可能性,因为二元运算符可以将任何类型的操作数与几乎任何其他类型的操作数组合在一起。
As a more extreme example of the dangers and costs of too much coercion, consider PL/I’s efforts to achieve flexibility in expressions. In PL/I, a character string variable can be the operand of an arithmetic operator with an integer as the other operand. At run time, the string is scanned for a numeric value. If the value happens to contain a decimal point, the value is assumed to be of floating-point type, the other operand is coerced to floating point, and the resulting operation is floating-point. This coercion policy is very expensive, because both the type check and the conversion must be done at run time. It also eliminates the possibility of detecting programmer errors in expressions, because a binary operator can combine an operand of any type with an operand of virtually any other type.
b将和的值c强制转换为int并int执行加法。然后将和转换为byte并放入a。考虑到当代计算机的内存容量很大,除非必须存储大量内存,否则很少有人会使用byte和。short
The values of b and c are coerced to int and an int addition is performed. Then, the sum is converted to byte and put in a. Given the large size of the memories of contemporary computers, there is little incentive to use byte and short, unless a large number of them must be stored.
大多数语言都提供了一些显式转换(包括扩展和收缩)的功能。在某些情况下,当显式收缩转换导致被转换对象的值发生重大变化时,会产生警告消息。
Most languages provide some capability for doing explicit conversions, both widening and narrowing. In some cases, warning messages are produced when an explicit narrowing conversion results in a significant change to the value of the object being converted.
在基于 C 的语言中,显式类型转换称为强制类型转换。要指定强制类型转换,需要将所需类型放在要转换的表达式之前的括号中,例如
In the C-based languages, explicit type conversions are called casts. To specify a cast, the desired type is placed in parentheses just before the expression to be converted, as in
(int)angle(int)angle
在这些转换中类型名称周围使用括号的原因之一是,这些语言中的第一个 C 语言有几个双字类型名称,例如long int。
One of the reasons for the parentheses around the type name in these conversions is that the first of these languages, C, has several two-word type names, such as long int.
在 ML 和 F# 中,强制类型转换具有函数调用的语法。例如,在 F# 中,我们可以有以下内容:
In ML and F#, the casts have the syntax of function calls. For example, in F# we could have the following:
float(sum)float(sum)表达式求值期间可能会发生许多错误。如果语言需要类型检查(静态或动态),则不会发生操作数类型错误。由于表达式中的操作数强制转换而可能发生的错误已经讨论过。其他类型的错误是由于计算机算术的局限性和算术的固有局限性造成的。最常见的错误是当运算结果无法在必须存储它的存储单元中表示时。这称为溢出或下溢,取决于结果是太大还是太小。算术的一个限制是不允许除以零。当然,数学上不允许这一事实并不妨碍程序尝试这样做。
A number of errors can occur during expression evaluation. If the language requires type checking, either static or dynamic, then operand type errors cannot occur. The errors that can occur because of coercions of operands in expressions have already been discussed. The other kinds of errors are due to the limitations of computer arithmetic and the inherent limitations of arithmetic. The most common error occurs when the result of an operation cannot be represented in the memory cell where it must be stored. This is called overflow or underflow, depending on whether the result was too large or too small. One limitation of arithmetic is that division by zero is disallowed. Of course, the fact that it is not mathematically allowed does not prevent a program from attempting to do it.
浮点溢出、下溢和除以零都是运行时错误的例子,有时也称为异常。第14章 讨论了允许程序检测和处理异常的语言功能。
Floating-point overflow, underflow, and division by zero are examples of run-time errors, which are sometimes called exceptions. Language facilities that allow programs to detect and deal with exceptions are discussed in Chapter 14.
除了算术表达式之外,编程语言还支持关系表达式和布尔表达式。
In addition to arithmetic expressions, programming languages support relational and Boolean expressions.
关系运算符是比较两个操作数的值的运算符。关系表达式有两个操作数和一个关系运算符。关系表达式的值是布尔值,除非语言中不包含布尔值类型。关系运算符通常会针对多种类型进行重载。确定关系表达式真假的运算取决于操作数类型。它可以很简单(例如整数操作数),也可以很复杂(例如字符串操作数)。通常,可用于关系运算符的操作数类型是数字类型、字符串和枚举类型。
A relational operator is an operator that compares the values of its two operands. A relational expression has two operands and one relational operator. The value of a relational expression is Boolean, except when Boolean is not a type included in the language. The relational operators are often overloaded for a variety of types. The operation that determines the truth or falsehood of a relational expression depends on the operand types. It can be simple, as for integer operands, or complex, as for character string operands. Typically, the types of the operands that can be used for relational operators are numeric types, strings, and enumeration types.
Fortran I 的设计者们对关系运算符使用了英文缩写,因为在设计 Fortran I 的时候(20 世纪 50 年代中期),打孔卡上还没有符号>和。<
The Fortran I designers used English abbreviations for the relational operators because the symbols > and < were not on the card punches at the time of Fortran I’s design (mid-1950s).
不同编程语言中,相等和不等的关系运算符的语法有所不同。例如,对于不等式,基于 C 的语言使用!=,而 Fortran
使用.NE.或<>,而 ML 和 F# 使用<>。
The syntax of the relational operators for equality and inequality differs among some programming languages. For example, for inequality, the C-based languages use !=, Fortran
uses .NE. or <>, and ML and F# use <>.
JavaScript 和 PHP 有两个额外的关系运算符===和。它们与它们的亲戚和!==类似,但阻止它们的操作数被强制转换。例如,表达式==!=
JavaScript and PHP have two additional relational operators, === and !==. These are similar to their relatives, == and !=, but prevent their operands from being coerced. For example, the expression
"7" == 7"7" == 7
在 JavaScript 中为真,因为当字符串和数字作为关系运算符的操作数时,字符串会被强制转换为数字。然而,
is true in JavaScript, because when a string and a number are the operands of a relational operator, the string is coerced to a number. However,
"7" === 7"7" === 7
为假,因为该运算符的操作数没有进行任何强制转换。
is false, because no coercion is done on the operands of this operator.
Ruby 使用==来表示使用强制转换的相等关系运算符,以及eql?不使用强制转换的相等关系运算符。Ruby===仅在when其语句的子句中使用,如第8章case所述。
Ruby uses == for the equality relational operator that uses coercions, and eql? for equality with no coercions. Ruby uses === only in the when clause of its case statement, as discussed in Chapter 8.
关系运算符的优先级始终低于算术运算符,因此在如下表达式中
The relational operators always have lower precedence than the arithmetic operators, so that in expressions such as
a + 1 > 2 * ba + 1 > 2 * b
首先对算术表达式进行求值。
the arithmetic expressions are evaluated first.
布尔表达式由布尔变量、布尔常量、关系表达式和布尔运算符组成。运算符通常包括“与”、“或”和“非”运算,有时还包括“异或”和“非”运算。等价性。布尔运算符通常只接受布尔操作数(布尔变量、布尔文字或关系表达式)并产生布尔值。
Boolean expressions consist of Boolean variables, Boolean constants, relational expressions, and Boolean operators. The operators usually include those for the AND, OR, and NOT operations, and sometimes for exclusive OR and equivalence. Boolean operators usually take only Boolean operands (Boolean variables, Boolean literals, or relational expressions) and produce Boolean values.
在布尔代数数学中,OR 和 AND 运算符必须具有相同的优先级。但是,基于 C 的语言为 AND 分配的优先级高于 OR。这可能是由于乘法与 AND 以及加法与 OR 之间毫无根据的关联所致,这自然会为 AND 分配更高的优先级。
In the mathematics of Boolean algebras, the OR and AND operators must have equal precedence. However, the C-based languages assign a higher precedence to AND than OR. Perhaps this resulted from the baseless correlation of multiplication with AND and of addition with OR, which would naturally assign higher precedence to AND.
因为算术表达式可以作为关系表达式的操作数,而关系表达式可以作为布尔表达式的操作数,所以这三类运算符必须相对于彼此处于不同的优先级别。
Because arithmetic expressions can be the operands of relational expressions, and relational expressions can be the operands of Boolean expressions, the three categories of operators must be placed in different precedence levels, relative to each other.
C 语言中算术运算符、关系运算符和布尔运算符的优先级如下:
The precedence of the arithmetic, relational, and Boolean operators in the C-based languages is as follows:
在流行的命令式语言中,C99 之前的 C 版本很奇怪,因为它们没有布尔类型,因此没有布尔值。相反,使用数值来表示布尔值。标量变量(数字或字符)和常量代替布尔操作数,其中零被视为假,所有非零值都被视为真。评估此类表达式的结果是一个整数,如果为假,则值为 0,如果为真,则值为 1。算术表达式也可用于 C99 和 C++ 中的布尔表达式。
Versions of C prior to C99 are odd among the popular imperative languages in that they have no Boolean type and thus no Boolean values. Instead, numeric values are used to represent Boolean values. In place of Boolean operands, scalar variables (numeric or character) and constants are used, with zero considered false and all nonzero values considered true. The result of evaluating such an expression is an integer, with the value 0 if false and 1 if true. Arithmetic expressions can also be used for Boolean expressions in C99 and C++.
C 关系表达式的设计有一个奇怪的后果,那就是下面的表达式是合法的:
One odd result of C’s design of relational expressions is that the following expression is legal:
a > b > ca > b > c
首先计算最左边的关系运算符,因为 C 的关系运算符是左结合的,结果要么是 0,要么是 1。然后,将此结果与变量进行比较。在此表达式中,c永远不会有 和 之间的比较b。c
The leftmost relational operator is evaluated first because the relational operators of C are left associative, producing either 0 or 1. Then, this result is compared with the variable c. There is never a comparison between b and c in this expression.
包括 Perl 和 Ruby 在内的一些语言提供了两组二进制逻辑运算符,&&分别and为 和 和。和(和和||)之间的一个区别是拼写版本的优先级较低。此外,和具有同等优先级,但优先级高于。or&&and||orandor&&||
Some languages, including Perl and Ruby, provide two sets of the binary logic operators, && and and for AND and || and or for OR. One difference between && and and (and || and or) is that the spelled versions have lower precedence. Also, and and or have equal precedence, but && has higher precedence than ||.
如果算上 C 语言的非算术运算符,则有 40 多个运算符和至少 14 种不同的优先级。这充分表明了这些语言中运算符集合的丰富性和表达式的复杂性。
When the nonarithmetic operators of the C-based languages are included, there are more than 40 operators and at least 14 different levels of precedence. This is clear evidence of the richness of the collections of operators and the complexity of expressions possible in these languages.
可读性要求语言应包含布尔类型(如第6章 所述),而不是简单地在布尔表达式中使用数字类型。使用数字类型作为布尔操作数会丢失一些错误检测,因为任何数字表达式(无论是否有意为之)都是布尔运算符的合法操作数。在其他命令式语言中,任何用作布尔运算符操作数的非布尔表达式都会被检测为错误。
Readability dictates that a language should include a Boolean type, as was stated in Chapter 6, rather than simply using numeric types in Boolean expressions. Some error detection is lost in the use of numeric types for Boolean operands, because any numeric expression, whether intended or not, is a legal operand to a Boolean operator. In the other imperative languages, any non-Boolean expression used as an operand of a Boolean operator is detected as an error.
short表达式的电路求值是指在不求值所有操作数和/或运算符的情况下确定结果的求值。例如,算术表达式的值
A short-circuit evaluation of an expression is one in which the result is determined without evaluating all of the operands and/or operators. For example, the value of the arithmetic expression
(13 * a) * (b / 13 - 1)(13 * a) * (b / 13 - 1)
(b / 13 - 1)如果a为,则与 的值无关0,因为0 * x = 0对于任何x。因此,当a为时0,无需求值(b / 13 - 1)或执行第二次乘法。然而,在算术表达式中,这种快捷方式在执行过程中不容易被发现,因此永远不会被采用。
is independent of the value of (b / 13 - 1) if a is 0, because 0 * x = 0 for any x. So, when a is 0, there is no need to evaluate (b / 13 - 1) or perform the second multiplication. However, in arithmetic expressions, this shortcut is not easily detected during execution, so it is never taken.
布尔表达式的值
The value of the Boolean expression
(a >= 0) && (b < 10)(a >= 0) && (b < 10)
如果 ,则与第二个关系表达式无关a < 0,因为表达式 ( FALSE &&( b < 10))FALSE适用于 的所有值b。因此,当a小于零时,无需计算b、常数10、第二个关系表达式或&&运算。与算术表达式的情况不同,这种快捷方式很容易在执行过程中被发现。
is independent of the second relational expression if a < 0, because the expression (FALSE && (b < 10)) is FALSE for all values of b. So, when a is less than zero, there is no need to evaluate b, the constant 10, the second relational expression, or the && operation. Unlike the case of arithmetic expressions, this shortcut easily can be discovered during execution.
为了说明非短路求值布尔表达式的潜在问题,假设 Java 不使用短路求值。可以使用该语句编写表查找循环。假设 是具有元素的数组,是要搜索的数组,是搜索的值,则while这种查找的一个简单版本的 Java 代码是listlistlenkey
To illustrate a potential problem with non-short-circuit evaluation of Boolean expressions, suppose Java did not use short-circuit evaluation. A table lookup loop could be written using the while statement. One simple version of Java code for such a lookup, assuming that list, which has listlen elements, is the array to be searched and key is the searched-for value, is
index = 0;
while ((index < listlen) && (list[index] != key))
index = index + 1;
index = 0;
while ((index < listlen) && (list[index] != key))
index = index + 1;
如果求值不是短路,while则无论第一个值是什么,都会求值语句的布尔表达式中的两个关系表达式。因此,如果key不在 中list,程序将以下标超出范围异常终止。具有 的相同迭代index == listlen将引用list[listlen],这会导致索引错误,因为列表被声明为具有listlen-1上限下标值。
If evaluation is not short-circuit, both relational expressions in the Boolean expression of the while statement are evaluated, regardless of the value of the first. Thus, if key is not in list, the program will terminate with a subscript out-of-range exception. The same iteration that has index == listlen will reference list[listlen], which causes the indexing error because list is declared to have listlen-1 as an upper-bound subscript value.
如果一种语言提供了布尔表达式的短路求值并且被使用,那么这不是问题。在前面的例子中,短路求值方案将求值 AND 运算符的第一个操作数,但如果第一个操作数为假,它将跳过第二个操作数。
If a language provides short-circuit evaluation of Boolean expressions and it is used, this is not a problem. In the preceding example, a short-circuit evaluation scheme would evaluate the first operand of the AND operator, but it would skip the second operand if the first operand is false.
一种提供布尔表达式的短路求值且表达式中还具有副作用的语言允许发生细微错误。假设对表达式使用短路求值,并且未求值包含副作用的表达式部分;那么副作用只会在整个表达式的完整求值中发生。如果程序的正确性取决于副作用,短路求值可能会导致严重错误。例如,考虑 Java 表达式
A language that provides short-circuit evaluations of Boolean expressions and also has side effects in expressions allows subtle errors to occur. Suppose that short-circuit evaluation is used on an expression and part of the expression that contains a side effect is not evaluated; then the side effect will occur only in complete evaluations of the whole expression. If program correctness depends on the side effect, short-circuit evaluation can result in a serious error. For example, consider the Java expression
(a > b) || ((b++) / 3)(a > b) || ((b++) / 3)
在这个表达式中,b只有当 时才会改变(在第二个算术表达式中)a <= b。如果程序员假设b在执行过程中每次评估这个表达式时 都会改变(并且程序的正确性取决于它),那么程序就会失败。
In this expression, b is changed (in the second arithmetic expression) only when a <= b. If the programmer assumed b would be changed every time this expression is evaluated during execution (and the program’s correctness depends on it), the program will fail.
在基于 C 的语言中,通常的 AND 和 OR 运算符&&和||分别是短路运算符。但是,这些语言也分别具有按位 AND 和 OR 运算符&和,它们可用于布尔值操作数,并且不是短路运算符。当然,只有当所有操作数都限制为(表示假)或(表示真)|时,按位运算符才等同于通常的布尔运算符。01
In the C-based languages, the usual AND and OR operators, && and ||, respectively, are short-circuit. However, these languages also have bitwise AND and OR operators, & and |, respectively, that can be used on Boolean-valued operands and are not short-circuit. Of course, the bitwise operators are only equivalent to the usual Boolean operators if all operands are restricted to being either 0 (for false) or 1 (for true).
Ruby、Perl、ML、F# 和 Python 的所有逻辑运算符都经过短路评估。
All of the logical operators of Ruby, Perl, ML, F#, and Python are short-circuit evaluated.
正如我们之前所说,赋值语句是命令式语言的核心结构之一。它提供了一种机制,用户可以通过该机制动态更改值与变量的绑定。在下一节中,我们将讨论最简单的赋值形式。后续章节将介绍各种替代方案。
As we have previously stated, the assignment statement is one of the central constructs in imperative languages. It provides the mechanism by which the user can dynamically change the bindings of values to variables. In the following section, the simplest form of assignment is discussed. Subsequent sections describe a variety of alternatives.
目前使用的几乎所有编程语言都使用等号作为赋值运算符。所有这些语言都必须使用不同于等号的符号作为相等关系运算符,以避免与赋值运算符混淆。
Nearly all programming languages currently being used use the equal sign for the assignment operator. All of these must use something different from an equal sign for the equality relational operator to avoid confusion with their assignment operator.
ALGOL 60 率先使用:=作为赋值运算符,这样可以避免赋值与相等的混淆。Ada 也使用了此赋值运算符。
ALGOL 60 pioneered the use of := as the assignment operator, which avoids the confusion of assignment with equality. Ada also uses this assignment operator.
语言中赋值语句的设计选择多种多样。在某些语言中,例如 Fortran 和 Ada,赋值语句只能作为独立语句出现,并且目标仅限于单个变量。但是,还有许多替代方案。
The design choices of how assignments are used in a language have varied widely. In some languages, such as Fortran and Ada, an assignment can appear only as a stand-alone statement, and the destination is restricted to a single variable. There are, however, many alternatives.
Perl 允许在赋值语句中设置条件目标。例如,考虑
Perl allows conditional targets on assignment statements. For example, consider
($flag ? $count1 : $count2) = 0;($flag ? $count1 : $count2) = 0;
相当于
which is equivalent to
if ($flag) {
$count1 = 0;
} else {
$count2 = 0;
}if ($flag) {
$count1 = 0;
} else {
$count2 = 0;
}复合赋值运算符是指定常用赋值形式的简写方法。可以用此技术缩写的赋值形式使目标变量也作为右侧表达式中的第一个操作数出现,例如
A compound assignment operator is a shorthand method of specifying a commonly needed form of assignment. The form of assignment that can be abbreviated with this technique has the destination variable also appearing as the first operand in the expression on the right side, as in
a = a + ba = a + b
复合赋值运算符由 ALGOL 68 引入,后来以略有不同的形式被 C 采用,并且是其他基于 C 的语言以及 Perl、JavaScript、Python 和 Ruby 的一部分。这些赋值运算符的语法是将所需的二元运算符连接到运算=符。例如,
Compound assignment operators were introduced by ALGOL 68, were later adopted in a slightly different form by C, and are part of the other C-based languages, as well as Perl, JavaScript, Python, and Ruby. The syntax of these assignment operators is the catenation of the desired binary operator to the = operator. For example,
sum += value;sum += value;
相当于
is equivalent to
sum = sum + value;sum = sum + value;
支持复合赋值运算符的语言对于大多数二元运算符都有版本。
The languages that support compound assignment operators have versions for most of their binary operators.
基于 C 的语言、Perl 和 JavaScript 包含两个特殊的一元算术运算符,它们实际上是缩写的赋值。它们将增量和减量运算与赋值结合在一起。增量和减量运算符++既可--用于表达式,也可用于形成独立的单运算符赋值语句。它们可以作为前缀运算符出现,这意味着它们位于操作数之前,也可以作为后缀运算符出现,这意味着它们位于操作数之后。在赋值语句中
The C-based languages, Perl, and JavaScript include two special unary arithmetic operators that are actually abbreviated assignments. They combine increment and decrement operations with assignment. The operators ++ for increment and -- for decrement can be used either in expressions or to form stand-alone single-operator assignment statements. They can appear either as prefix operators, meaning that they precede the operands, or as postfix operators, meaning that they follow the operands. In the assignment statement
sum = ++ count;sum = ++ count;
的值count增加1,然后赋值给sum。此操作也可以表述为
the value of count is incremented by 1 and then assigned to sum. This operation could also be stated as
count = count + 1;
sum = count;
count = count + 1;
sum = count;
如果将相同的运算符用作后缀运算符,例如
If the same operator is used as a postfix operator, as in
sum = count ++;sum = count ++;
首先对countto的值进行赋值,然后对其加 1。其效果与下面两个语句相同sumcount
the assignment of the value of count to sum occurs first; then count is incremented. The effect is the same as that of the two statements
sum = count;
count = count + 1;
sum = count;
count = count + 1;
使用一元增量运算符形成完整赋值语句的示例是
An example of the use of the unary increment operator to form a complete assignment statement is
count ++;count ++;
它只是增加了count。它看起来不像是赋值,但它肯定是赋值。它相当于语句
which simply increments count. It does not look like an assignment, but it certainly is one. It is equivalent to the statement
count = count + 1;count = count + 1;
当两个一元运算符作用于同一个操作数时,关联关系是从右到左。例如,在
When two unary operators apply to the same operand, the association is right to left. For example, in
- count ++- count ++
count首先增加,然后取反。因此,它相当于
count is first incremented and then negated. So, it is equivalent to
- (count ++)- (count ++)
而不是
rather than
(- count) ++
(- count) ++在基于 C 的语言、Perl 和 JavaScript 中,赋值语句会产生一个结果,该结果与分配给目标的值相同。因此,它可以用作表达式和其他表达式中的操作数。这种设计将赋值运算符视为与任何其他二元运算符非常相似,只是它具有改变其左操作数的副作用。例如,在 C 中,通常会编写如下语句
In the C-based languages, Perl, and JavaScript, the assignment statement produces a result, which is the same as the value assigned to the target. It can therefore be used as an expression and as an operand in other expressions. This design treats the assignment operator much like any other binary operator, except that it has the side effect of changing its left operand. For example, in C, it is common to write statements such as
while ((ch = getchar()) != EOF) { ... }while ((ch = getchar()) != EOF) { ... }
在此语句中,使用 从标准输入文件(通常是键盘)获取下一个字符,并将其getchar赋值给变量ch。然后将结果或赋值与常量 进行比较EOF。如果ch不等于EOF,则执行复合语句{...}。请注意,赋值必须用括号括起来——在支持将赋值作为表达式的语言中,赋值运算符的优先级低于关系运算符的优先级。如果没有括号,新字符将EOF首先与 进行比较。然后,将该比较的结果(0或1)赋值给ch。
In this statement, the next character from the standard input file, usually the keyboard, is gotten with getchar and assigned to the variable ch. The result, or value assigned, is then compared with the constant EOF. If ch is not equal to EOF, the compound statement {...} is executed. Note that the assignment must be parenthesized—in the languages that support assignment as an expression, the precedence of the assignment operator is lower than that of the relational operators. Without the parentheses, the new character would be compared with EOF first. Then, the result of that comparison, either 0 or 1, would be assigned to ch.
允许赋值语句作为表达式中的操作数的缺点是它提供了另一种表达式副作用。这种副作用可能导致难以阅读和理解的表达式。具有任何副作用的表达式都有这个缺点。这样的表达式不能被读作表达式,在数学中,表达式是值的表示,而只能被读作具有奇数执行顺序的指令列表。例如,表达式
The disadvantage of allowing assignment statements to be operands in expressions is that it provides yet another kind of expression side effect. This type of side effect can lead to expressions that are difficult to read and understand. An expression with any kind of side effect has this disadvantage. Such an expression cannot be read as an expression, which in mathematics is a denotation of a value, but only as a list of instructions with an odd order of execution. For example, the expression
a = b + (c = d / b) - 1a = b + (c = d / b) - 1
表示指令
denotes the instructions
Assign d / b to c
Assign b + c to temp
Assign temp - 1 to a
Assign d / b to c
Assign b + c to temp
Assign temp - 1 to a
请注意,将赋值运算符视为任何其他二元运算符可以实现多目标赋值的效果,例如
Note that the treatment of the assignment operator as any other binary operator allows the effect of multiple-target assignments, such as
sum = count = 0;sum = count = 0;
其中首先将 count 赋值为零,然后将 count 的值赋值给 sum。这种多目标赋值形式在 Python 中也是合法的。
in which count is first assigned the zero, and then count’s value is assigned to sum. This form of multiple-target assignments is also legal in Python.
在 C 语言的赋值操作设计中,缺少错误检测,这经常导致程序错误。特别是,如果我们输入
There is a loss of error detection in the C design of the assignment operation that frequently leads to program errors. In particular, if we type
if (x = y) ...if (x = y) ...
而不是
instead of
if (x == y) ...if (x == y) ...
这是一个很容易犯的错误,但编译器无法检测到它是否是错误。测试的不是关系表达式,而是分配给的值x(在本例中,是到达此语句的值y)。这实际上是两个设计决策的结果:允许赋值像普通的二元运算符一样运行,并使用两个非常相似的运算符,
和
,含义完全不同。这是 C 和 C++ 程序安全性缺陷的另一个例子。请注意,Java 和 C# 只允许boolean在其语句中使用表达式if,从而避免出现此问题。
which is an easily made mistake, it is not detectable as an error by the compiler. Rather than testing a relational expression, the value that is assigned to x is tested (in this case, it is the value of y that reaches this statement). This is actually a result of two design decisions: allowing assignment to behave like an ordinary binary operator and using two very similar operators,
and
, to have completely different meanings. This is another example of the safety deficiencies of C and C++ programs. Note that Java and C# allow only boolean expressions in their if statements, disallowing this problem.
包括 Perl 和 Ruby 在内的几种最新编程语言都提供了多目标、多源赋值语句。例如,在 Perl 中可以这样写:
Several recent programming languages, including Perl and Ruby provide multiple-target, multiple-source assignment statements. For example, in Perl one can write
($first, $second, $third) = (20, 40, 60);($first, $second, $third) = (20, 40, 60);
语义为20被赋值给$first,40被赋值给$second,60被赋值给$third。如果必须交换两个变量的值,则可以通过一次赋值来完成,例如
The semantics is that 20 is assigned to $first, 40 is assigned to $second, and 60 is assigned to $third. If the values of two variables must be interchanged, this can be done with a single assignment, as with
($first, $second) = ($second, $first);($first, $second) = ($second, $first);
$first这样可以正确交换和的值$second,而无需使用临时变量(至少一个由程序员创建和管理的变量)。
This correctly interchanges the values of $first and $second, without the use of a temporary variable (at least one created and managed by the programmer).
首次实现 C 的 PDP-11 计算机具有自动递增和自动递减寻址模式,它们是 C 的递增和递减运算符用作数组索引时的硬件版本。人们可能由此猜测这些 C 运算符的设计是基于 PDP-11 架构的设计。然而,这种猜测是错误的,因为 C 运算符是从 B 语言继承而来的,而 B 语言是在第一个 PDP-11 之前设计的。
The PDP-11 computer, on which C was first implemented, has autoincrement and autodecrement addressing modes, which are hardware versions of the increment and decrement operators of C when they are used as array indices. One might guess from this that the design of these C operators was based on the design of the PDP-11 architecture. That guess would be wrong, however, because the C operators were inherited from the B language, which was designed before the first PDP-11.
Ruby 中最简单的多重赋值形式的语法与 Perl 中的语法类似,只是左右两侧没有括号。此外,Ruby 还包含几个更复杂的多重赋值版本,本文不再赘述。
The syntax of the simplest form of Ruby’s multiple assignment is similar to that of Perl, except the left and right sides are not parenthesized. Also, Ruby includes a few more elaborate versions of multiple assignments, which are not discussed here.
纯函数式语言中使用的所有标识符以及其他函数式语言中使用的一些标识符都只是值的名称。因此,它们的值永远不会改变。例如,在 ML 中,名称通过声明绑定到值val,其形式如下所示:
All of the identifiers used in pure functional languages and some of them used in other functional languages are just names of values. As such, their values never change. For example, in ML, names are bound to values with the val declaration, whose form is exemplified in the following:
val cost = quantity * price;val cost = quantity * price;
如果cost出现在后续声明的左侧val,则该声明会创建名称的新版本cost,该新版本与前一个版本没有任何关系,然后被隐藏。
If cost appears on the left side of a subsequent val declaration, that declaration creates a new version of the name cost, which has no relationship with the previous version, which is then hidden.
F# 有一个使用保留字的类似声明let。F#let和 ML之间的区别val在于,let会创建一个新范围,而val不会。事实上,在 ML 中,val声明通常嵌套在let构造中。 let并将在第15章val中进一步讨论。
F# has a somewhat similar declaration that uses the let reserved word. The difference between F#’s let and ML’s val is that let creates a new scope, whereas val does not. In fact, val declarations are often nested in let constructs in ML. let and val are further discussed in Chapter 15.
混合模式表达式在7.4.1节 中进行了讨论。赋值语句通常也是混合模式。设计问题是:表达式的类型是否必须与被赋值的变量的类型相同,或者是否可以在某些类型不匹配的情况下使用强制转换?
Mixed-mode expressions were discussed in Section 7.4.1. Frequently, assignment statements also are mixed mode. The design question is: Does the type of the expression have to be the same as the type of the variable being assigned, or can coercion be used in some cases of type mismatch?
C、C++ 和 Perl 对混合模式赋值使用的强制规则与它们对混合模式表达式使用的规则类似;也就是说,许多可能的类型混合都是合法的,并且可以自由应用强制。7
C, C++, and Perl use coercion rules for mixed-mode assignment that are similar to those they use for mixed-mode expressions; that is, many of the possible type mixes are legal, with coercion freely applied.7
与 C++ 明显不同的是,Java 和 C# 仅在所需强制扩展时才允许混合模式赋值。8因此,int可以将值赋给float变量,但反之则不行。相对于 C 和 C++,禁止一半可能的混合模式赋值是提高 Java 和 C# 可靠性的一种简单但有效的方法。
In a clear departure from C++, Java and C# allow mixed-mode assignment only if the required coercion is widening.8 So, an int value can be assigned to a float variable, but not vice versa. Disallowing half of the possible mixed-mode assignments is a simple but effective way to increase the reliability of Java and C#, relative to C and C++.
当然,在函数式语言中,赋值只是用来命名值,不存在混合模式赋值。
Of course, in functional languages, where assignments are just used to name values, there is no such thing as a mixed-mode assignment.
表达式由常量、变量、括号、函数调用和运算符组成。赋值语句包括目标变量、赋值运算符和表达式。
Expressions consist of constants, variables, parentheses, function calls, and operators. Assignment statements include target variables, assignment operators, and expressions.
表达式的语义很大程度上由运算符的求值顺序决定。语言表达式中运算符的结合性和优先级规则决定了这些表达式中运算符的求值顺序。如果可能出现函数副作用,则操作数求值顺序很重要。类型转换可以是扩展的,也可以是收缩的。一些收缩转换会产生错误值。表达式中的隐式类型转换或强制转换很常见,但它们会消除类型检查的错误检测优势,从而降低可靠性。
The semantics of an expression is determined in large part by the order of evaluation of operators. The associativity and precedence rules for operators in the expressions of a language determine the order of operator evaluation in those expressions. Operand evaluation order is important if functional side effects are possible. Type conversions can be widening or narrowing. Some narrowing conversions produce erroneous values. Implicit type conversions, or coercions, in expressions are common, although they eliminate the error-detection benefit of type checking, thus lowering reliability.
赋值语句有多种形式,包括条件目标、赋值运算符和列表赋值。
Assignment statements have appeared in a wide variety of forms, including conditional targets, assigning operators, and list assignments.
定义运算符优先级和运算符结合性。
Define operator precedence and operator associativity.
什么是三元运算符?
What is a ternary operator?
什么是前缀运算符?
What is a prefix operator?
哪些运算符通常具有右结合性?
What operator usually has right associativity?
什么是非结合运算符?
What is a nonassociative operator?
APL 使用什么结合规则?
What associativity rules are used by APL?
C++ 和 Ruby 中运算符的实现方式有何不同?
What is the difference between the way operators are implemented in C++ and Ruby?
定义功能副作用。
Define functional side effect.
什么是胁迫?
What is a coercion?
什么是条件表达式?
What is a conditional expression?
什么是重载运算符?
What is an overloaded operator?
定义收缩和扩展转换。
Define narrowing and widening conversions.
==在 JavaScript 中,和有什么区别===?
In JavaScript, what is the difference between == and ===?
什么是混合模式表达式?
What is a mixed-mode expression?
什么是引用透明度?
What is referential transparency?
引用透明性有哪些优点?
What are the advantages of referential transparency?
操作数评估顺序如何与功能副作用相互作用?
How does operand evaluation order interact with functional side effects?
什么是短路求值?
What is short-circuit evaluation?
说出一种总是对布尔表达式进行短路求值的语言。说出一种从不这样做的语言。
Name a language that always does short-circuit evaluation of Boolean expressions. Name one that never does it.
C 如何支持关系和布尔表达式?
How does C support relational and Boolean expressions?
复合赋值运算符的用途是什么?
What is the purpose of a compound assignment operator?
C 的一元算术运算符的结合性是什么?
What is the associativity of C’s unary arithmetic operators?
将赋值运算符视为算术运算符的一个可能的缺点是什么?
What is one possible disadvantage of treating the assignment operator as if it were an arithmetic operator?
哪两种语言包含多项作业?
What two languages include multiple assignments?
Java 中允许哪些混合模式分配?
What mixed-mode assignments are allowed in Java?
ML 中允许哪些混合模式分配?
What mixed-mode assignments are allowed in ML?
什么是演员?
What is a cast?
何时您可能希望编译器忽略表达式中的类型差异?
When might you want the compiler to ignore type differences in an expression?
陈述你自己支持和反对允许混合模式算术表达式的论点。
State your own arguments for and against allowing mixed-mode arithmetic expressions.
您认为消除您最喜欢的语言中的重载运算符会有益吗?为什么或为什么不?
Do you think the elimination of overloaded operators in your favorite language would be beneficial? Why or why not?
消除所有运算符优先级规则并要求使用括号来显示表达式中所需的优先级是一个好主意吗?为什么或为什么不?
Would it be a good idea to eliminate all operator precedence rules and require parentheses to show the desired precedence in expressions? Why or why not?
C 的赋值操作(例如)是否应该包含在其他语言中(尚未包含这些操作)?为什么或为什么不?
Should C’s assigning operations (for example,
) be included in other languages (that do not already have them)? Why or why not?
C 的单操作数赋值形式(例如++count)是否应包含在其他语言中(尚未包含这些形式)?为什么或为什么不?
Should C’s single-operand assignment forms (for example, ++count) be included in other languages (that do not already have them)? Why or why not?
描述编程语言中的加法运算符不交换的情况。
Describe a situation in which the add operator in a programming language would not be commutative.
描述编程语言中加法运算符不具有结合性的情况。
Describe a situation in which the add operator in a programming language would not be associative.
假设表达式的结合性和优先级规则如下:
通过将所有子表达式括起来并在右括号上放置上标来显示以下表达式的求值顺序。例如,对于表达式
Assume the following rules of associativity and precedence for expressions:
Show the order of evaluation of the following expressions by parenthesizing all subexpressions and placing a superscript on the right parenthesis to indicate order. For example, for the expression
a + b * c + da + b * c + d
评估顺序将表示为
the order of evaluation would be represented as
((a + (b * c)1)2 + d)3 ((a + (b * c)1)2 + d)3
a * b - 1 + c
a * b - 1 + c
a * (b - 1) / c 模式 d
a * (b - 1) / c mod d
(a - b) / c & (d * e / a - 3)
(a - b) / c & (d * e / a - 3)
-a or c = d and e
-a or c = d and e
a > b 异或 c or d <= 17
a > b xor c or d <= 17
-a + b
-a + b
显示问题 9 的表达式的求值顺序,假设没有优先规则并且所有运算符都从右到左关联。
Show the order of evaluation of the expressions of Problem 9, assuming that there are no precedence rules and all operators associate right to left.
写出问题 9 中表达式定义的优先级和结合性规则的 BNF 描述。假设唯一的操作数是名称a,b,c,d,和e。
Write a BNF description of the precedence and associativity rules defined for the expressions in Problem 9. Assume the only operands are the names a,b,c,d, and e.
使用问题 11 的语法,绘制问题 9 的表达式的解析树。
Using the grammar of Problem 11, draw parse trees for the expressions of Problem 9.
让函数fun定义为
Let the function fun be defined as
int fun(int* k) {
*k += 4;
return 3 * (*k) - 1;
}int fun(int* k) {
*k += 4;
return 3 * (*k) - 1;
}
假设fun在程序中使用如下:
Suppose fun is used in a program as follows:
void main() {
int i = 10, j = 10, sum1, sum2;
sum1 = (i / 2) + fun(&i);
sum2 = fun(&j) + (j / 2);
}void main() {
int i = 10, j = 10, sum1, sum2;
sum1 = (i / 2) + fun(&i);
sum2 = fun(&j) + (j / 2);
}
sum1和的值是什么sum2
What are the values of sum1 and sum2
表达式中的操作数是从左到右进行求值的吗?
operands in the expressions are evaluated left to right?
表达式中的操作数是从右到左进行求值的吗?
operands in the expressions are evaluated right to left?
您反对(或支持) APL 运算符优先规则的主要论点是什么?
What is your primary argument against (or for) the operator precedence rules of APL?
解释为什么消除 C 语言中的函数副作用很困难。
Explain why it is difficult to eliminate functional side effects in C.
对于您选择的某些语言,请列出一个运算符符号列表,用于消除所有运算符重载。
For some language of your choice, make up a list of operator symbols that could be used to eliminate all operator overloading.
确定您知道的两种语言中的缩小显式类型转换在转换值失去用处时是否会提供错误消息。
Determine whether the narrowing explicit type conversions in two languages you know provide error messages when a converted value loses its usefulness.
是否应允许 C 或 C++ 的优化编译器改变布尔表达式中的子表达式的顺序?为什么或为什么不可以?
Should an optimizing compiler for C or C++ be allowed to change the order of subexpressions in a Boolean expression? Why or why not?
考虑以下 C 程序:
Consider the following C program:
int fun(int *i) {
*i += 5;
return 4;
}
void main() {
int x = 3;
x = x + fun(&x);
}int fun(int *i) {
*i += 5;
return 4;
}
void main() {
int x = 3;
x = x + fun(&x);
}
x假设在 main 中的赋值语句之后的值是多少
What is the value of x after the assignment statement in main, assuming
操作数从左到右进行求值。
operands are evaluated left to right.
操作数从右到左进行求值。
operands are evaluated right to left.
为什么 Java 规定表达式中的操作数都按从左到右的顺序进行求值?
Why does Java specify that operands in expressions are all evaluated in left-to-right order?
解释语言的强制规则如何影响其错误检测。
Explain how the coercion rules of a language affect its error detection.
在某些支持 C 的系统上运行问题 13(在问题集中)中给出的代码来确定值sum1并sum2.解释结果。
Run the code given in Problem 13 (in the Problem Set) on some system that supports C to determine the values of sum1 and sum2. Explain the results.
用 C++、Java 和 C# 重写编程练习 1 的程序,运行它们,并比较结果。
Rewrite the program of Programming Exercise 1 in C++, Java, and C#, run them, and compare the results.
用您喜欢的语言编写一个测试程序,确定并输出其算术和布尔运算符的优先级和结合性。
Write a test program in your favorite language that determines and outputs the precedence and associativity of its arithmetic and Boolean operators.
编写一个 Java 程序,当其中一个操作数是方法调用时,公开 Java 的操作数评估顺序规则。
Write a Java program that exposes Java’s rule for operand evaluation order when one of the operands is a method call.
使用 C++ 重复编程练习 4。
Repeat Programming Exercise 4 with C++.
使用 C# 重复编程练习 4。
Repeat Programming Exercise 4 with C#.
用 C++、Java 或 C# 编写一个程序,说明用作方法实际参数的表达式的求值顺序。
Write a program in either C++, Java, or C# that illustrates the order of evaluation of expressions used as actual parameters to a method.
编写一个包含以下语句的 C 程序:
Write a C program that has the following statements:
int a, b;
a = 10;
b = a + fun();
printf("With the function call on the right, ");
printf(" b is: %d\n", b);
a = 10;
b = fun() + a;
printf("With the function call on the left, ");
printf(" b is: %d\n", b);int a, b;
a = 10;
b = a + fun();
printf("With the function call on the right, ");
printf(" b is: %d\n", b);
a = 10;
b = fun() + a;
printf("With the function call on the left, ");
printf(" b is: %d\n", b);
并定义fun将 加 10 a。解释结果。
and define fun to add 10 to a. Explain the results.
用 Java、C++ 或 C# 编写一个程序,执行大量浮点运算和相同数量的整数运算,并比较所需的时间。
Write a program in either Java, C++, or C# that performs a large number of floating-point operations and an equal number of integer operations and compare the time required.
程序中的控制流或执行顺序可以从多个层次进行研究。第7章讨论了表达式中的控制流,它受运算符结合性和优先规则的支配。最高层次是程序单元之间的控制流,第 9章和 第13章将对此进行讨论。在这两个极端之间是语句之间的控制流这一重要问题,这也是本章的主题。
The flow of control, or execution sequence, in a program can be examined at several levels. In Chapter 7, the flow of control within expressions, which is governed by operator associativity and precedence rules, was discussed. At the highest level is the flow of control among program units, which is discussed in Chapters 9 and 13. Between these two extremes is the important issue of the flow of control among statements, which is the subject of this chapter.
我们首先概述控制语句的演变。然后全面检查选择语句,包括双向选择语句和多选选择语句。接下来,我们讨论编程语言中开发和使用的各种循环语句。接下来,我们简要介绍与无条件分支语句相关的问题。最后,我们描述受保护的命令控制语句。
We begin by giving an overview of the evolution of control statements. This topic is followed by a thorough examination of selection statements, both those for two-way and those for multiple selection. Next, we discuss the variety of looping statements that have been developed and used in programming languages. Next, we take a brief look at the problems associated with unconditional branch statements. Finally, we describe the guarded command control statements.
命令式语言程序中的计算是通过计算表达式并将结果值赋给变量来完成的。但是,很少有有用的程序完全由赋值语句组成。至少需要两种额外的语言机制才能使程序中的计算灵活而强大:在备选控制流路径(语句执行)中进行选择的一些方法,以及导致语句或语句序列重复执行的一些方法。提供这些功能的语句称为控制语句。
Computations in imperative-language programs are accomplished by evaluating expressions and assigning the resulting values to variables. However, there are few useful programs that consist entirely of assignment statements. At least two additional linguistic mechanisms are necessary to make the computations in programs flexible and powerful: some means of selecting among alternative control flow paths (of statement execution) and some means of causing the repeated execution of statements or sequences of statements. Statements that provide these kinds of capabilities are called control statements.
函数式编程语言中的计算是通过求值表达式并将函数应用于给定参数来完成的。此外,表达式和函数之间的执行流程由其他表达式和函数控制,尽管其中一些类似于命令式语言中的控制语句。
Computations in functional programming languages are accomplished by evaluating expressions and applying functions to given parameters. Furthermore, the flow of execution among the expressions and functions is controlled by other expressions and functions, although some of them are similar to the control statements in the imperative languages.
第一种成功的编程语言 Fortran 的控制语句实际上是由 IBM 704 的架构师设计的。所有控制语句都与机器语言指令直接相关,因此它们的功能更多是指令设计的结果,而不是语言设计的结果。当时,人们对编程的难度知之甚少,因此,人们认为 20 世纪 50 年代中期的 Fortran 控制语句完全足够了。然而,按照今天的标准,它们被认为严重不足。
The control statements of the first successful programming language, Fortran, were, in effect, designed by the architects of the IBM 704. All were directly related to machine language instructions, so their capabilities were more the result of instruction design rather than language design. At the time, little was known about the difficulty of programming, and, as a result, the control statements of Fortran in the mid-1950s were thought to be entirely adequate. By today’s standards, however, they are considered seriously lacking.
从 20 世纪 60 年代中期到 70 年代中期的 10 年间,人们对控制语句进行了大量的研究和讨论。这些努力的主要结论之一是,尽管单个控制语句(可选择的 goto)至少已足够,但是设计为不包含 goto 的语言只需要少量不同的控制语句。事实上,已经证明,所有能用流程图表达的算法都可以用仅包含两个控制语句的编程语言来编码:一个用于在两个控制流路径之间进行选择,另一个用于逻辑控制迭代(Böhm 和 Jacopini,1966 年)。一个重要的结果是无条件分支语句是多余的——可能有用但不是必需的。这个事实,加上使用无条件分支或 goto 的实际问题,导致了关于 goto 的大量争论,正如第8.4节 所述。
A great deal of research and discussion was devoted to control statements in the 10 years between the mid-1960s and the mid-1970s. One of the primary conclusions of these efforts was that, although a single control statement (a selectable goto) is minimally sufficient, a language that is designed not to include a goto needs only a small number of different control statements. In fact, it was proven that all algorithms that can be expressed by flowcharts can be coded in a programming language with only two control statements: one for choosing between two control flow paths and one for logically controlled iterations (Böhm and Jacopini, 1966). An important result of this is that the unconditional branch statement is superfluous—potentially useful but nonessential. This fact, combined with the practical problems of using unconditional branches, or gotos, led to a great deal of debate about the goto, as will be discussed in Section 8.4.
程序员更关心可写性和可读性,而不是控制语句理论研究的结果。所有广泛使用的语言都包含比最低要求的控制语句更多的控制语句,因为控制语句的数量和种类越多,可写性就越高。例如,当计数器控制的循环语句可用于构建由计数器自然控制的循环时,编写程序会更容易,而不是要求所有循环都使用逻辑控制的循环语句。限制语言中控制语句数量的主要因素是可读性,因为大量语句形式的存在要求程序读者学习更大的语言。回想一下,很少有人会学习相对较大语言的所有语句;相反,他们学习他们选择使用的子集,这通常与编写他们试图阅读的程序的程序员使用的子集不同。另一方面,太少的控制语句可能需要使用较低级别的语句,例如 goto,这也会降低程序的可读性。
Programmers care less about the results of theoretical research on control statements than they do about writability and readability. All languages that have been widely used include more control statements than the two that are minimally required, because writability is enhanced by a larger number and wider variety of control statements. For example, rather than requiring the use of a logically controlled loop statement for all loops, it is easier to write programs when a counter-controlled loop statement can be used to build loops that are naturally controlled by a counter. The primary factor that restricts the number of control statements in a language is readability, because the presence of a large number of statement forms demands that program readers learn a larger language. Recall that few people learn all of the statements of a relatively large language; instead, they learn the subset they choose to use, which is often a different subset from that used by the programmer who wrote the program they are trying to read. On the other hand, too few control statements can require the use of lower-level statements, such as the goto, which also makes programs less readable.
关于哪种控制语句集合能够提供所需的功能和所需的可写性,这个问题一直备受争议。本质上,这是一个问题,即应该将一种语言扩展多少来提高其可写性,而要以牺牲其简单性、大小和可读性为代价。
The question as to the best collection of control statements to provide the required capabilities and the desired writability has been widely debated. It is essentially a question of how much a language should be expanded to increase its writability at the expense of its simplicity, size, and readability.
控制结构是一个控制语句及其执行所控制的语句的集合。
A control structure is a control statement and the collection of statements whose execution it controls.
只有一个设计问题与所有选择和迭代控制语句相关:控制结构是否应该有多个条目?所有选择和迭代构造都控制代码段的执行,问题是这些代码段的执行是否总是从段中的第一个语句开始。现在普遍认为,相对于复杂性增加导致的可读性下降,多个条目对控制语句的灵活性影响不大。请注意,只有在包含 goto 和语句标签的语言中才有可能存在多个条目。
There is only one design issue that is relevant to all of the selection and iteration control statements: Should the control structure have multiple entries? All selection and iteration constructs control the execution of code segments, and the question is whether the execution of those code segments always begins with the first statement in the segment. It is now generally believed that multiple entries add little to the flexibility of a control statement, relative to the decrease in readability caused by the increased complexity. Note that multiple entries are possible only in languages that include gotos and statement labels.
此时,读者可能会想知道为什么控制结构的多次退出不被视为设计问题。原因是所有编程语言都允许某种形式的控制结构的多次退出,其基本原理如下:如果控制结构的所有退出都被限制为将控制转移到结构后面的第一个语句(如果控制结构没有显式退出,控制将流向该语句),则不会损害可读性,也不会造成危险。但是,如果退出可以有一个不受限制的目标,因此可以导致将控制转移到包含控制结构的程序单元中的任何位置,则对可读性的损害与程序中其他任何地方的 goto 语句相同。具有 goto 语句的语言允许它出现在任何地方,包括控制结构中。因此,问题在于包含 goto,而不是是否允许控制表达式多次退出。
At this point, the reader might wonder why multiple exits from control structures are not considered a design issue. The reason is that all programming languages allow some form of multiple exits from control structures, the rationale being as follows: If all exits from a control structure are restricted to transferring control to the first statement following the structure, where control would flow if the control structure had no explicit exit, there is no harm to readability and also no danger. However, if an exit can have an unrestricted target and therefore can result in a transfer of control to anywhere in the program unit that contains the control structure, the harm to readability is the same as for a goto statement anywhere else in a program. Languages that have a goto statement allow it to appear anywhere, including in a control structure. Therefore, the issue is the inclusion of a goto, not whether multiple exits from control expressions are allowed.
选择语句提供了在程序中在两个或多个执行路径之间进行选择的方法。正如 Böhm 和 Jacopini 所证明的那样,此类语句是所有编程语言的基本组成部分。
A selection statement provides the means of choosing between two or more execution paths in a program. Such statements are fundamental and essential parts of all programming languages, as was proven by Böhm and Jacopini.
选择语句分为两大类:双向选择语句和 n 向选择语句(或多向选择语句)。双向选择语句在8.2.1节中讨论;多向选择语句在 8.2.2节 中讨论。
Selection statements fall into two general categories: two-way and n-way, or multiple selection. Two-way selection statements are discussed in Section 8.2.1; multiple-selection statements are covered in Section 8.2.2.
虽然当代命令式语言的双向选择语句非常相似,但它们的设计还是有一些差异。双向选择器的一般形式如下:
Although the two-way selection statements of contemporary imperative languages are quite similar, there are some variations in their designs. The general form of a two-way selector is as follows:
if control_expression
then 子句
else 子句
if control_expression
then clause
else clause
双向选择器的设计问题可以总结如下:
The design issues for two-way selectors can be summarized as follows:
控制选择的表达式的形式和类型是什么?
What is the form and type of the expression that controls the selection?
then 和 else 子句是如何指定的?
How are the then and else clauses specified?
嵌套选择器的含义应如何说明?
How should the meaning of nested selectors be specified?
then如果未使用保留字(或其他语法标记)来引入 then 子句,则控制表达式用括号指定。在then使用保留字(或替代标记)的情况下,括号的必要性较小,因此通常会省略它们,例如在 Ruby 中。
Control expressions are specified in parentheses if the then reserved word (or some other syntactic marker) is not used to introduce the then clause. In those cases where the then reserved word (or alternative marker) is used, there is less need for the parentheses, so they are often omitted, as in Ruby.
在没有布尔数据类型的 C89 中,算术表达式被用作控制表达式。在 Python、C99 和 C++ 中也可以这样做。但是,在这些语言中,算术表达式或布尔表达式都可以使用。在其他当代语言中,只有布尔表达式可以用作控制表达式。
In C89, which did not have a Boolean data type, arithmetic expressions were used as control expressions. This can also be done in Python, C99, and C++. However, in those languages either arithmetic or Boolean expressions can be used. In other contemporary languages, only Boolean expressions can be used for control expressions.
在许多语言中,then 和 else 子句要么以单个语句出现,要么以复合语句出现。Perl 就是这种情况的一个变体,其中所有 then 和 else 子句都必须是复合语句,即使它们只有一个语句。许多语言使用括号来形成复合语句,它们充当作为 then 和 else 子句的主体。在 Python 和 Ruby 中,then 和 else 子句是语句序列,而不是复合语句。在这些语言中,完整的选择语句以保留字结尾。
In many languages, the then and else clauses appear as either single statements or compound statements. One variation of this is Perl, in which all then and else clauses must be compound statements, even if they have only one statement. Many languages use braces to form compound statements, which serve as the bodies of then and else clauses. In Python and Ruby, the then and else clauses are statement sequences, rather than compound statements. The complete selection statement is terminated in these languages with a reserved word.
Python 使用缩进来指定复合语句。例如,
Python uses indentation to specify compound statements. For example,
if x > y :
x = y
print "case 1"if x > y :
x = y
print "case 1"
所有同等缩进的语句都包含在复合语句中。1请then注意,在 Python 中,使用冒号而不是来引入 then 子句。
All statements equally indented are included in the compound statement.1 Notice that rather than then, a colon is used to introduce the then clause in Python.
子句形式的变化对于嵌套选择器含义的规范有影响,如下一小节所述。
The variations in clause form have implications for the specification of the meaning of nested selectors, as discussed in the next subsection.
回想一下,在第3章 中,我们讨论了双向选择器语句的简单文法的句法歧义问题。该歧义文法如下:
Recall that in Chapter 3, we discussed the problem of syntactic ambiguity of a straightforward grammar for a two-way selector statement. That ambiguous grammar was as follows:
<if_stmt> → if<逻辑_expr then> <stmt> <逻辑_expr > <stmt> <stmt>
| ifthenelse
<if_stmt> → if <logic_expr> then <stmt>
| if <logic_expr> then <stmt> else <stmt>
问题是,当选择语句嵌套在选择语句的 then 子句中时,不清楚 else 子句应该与哪个 if 关联。这个问题反映在选择语句的语义中。考虑以下类似 Java 的代码:
The issue is that when a selection statement is nested in the then clause of a selection statement, it is not clear with which if an else clause should be associated. This problem is reflected in the semantics of selection statements. Consider the following Java-like code:
if (sum == 0)
if (count == 0)
result = 0;
else
result = 1;
if (sum == 0)
if (count == 0)
result = 0;
else
result = 1;
此语句可以以两种不同的方式解释,具体取决于 else 子句是与第一个 then 子句匹配还是与第二个 then 子句匹配。请注意,缩进似乎表明 else 子句属于第一个 then 子句。但是,除了 Python 和 F# 之外,缩进对当代语言的语义没有影响,因此会被其编译器忽略。
This statement can be interpreted in two different ways, depending on whether the else clause is matched with the first then clause or the second. Notice that the indentation seems to indicate that the else clause belongs with the first then clause. However, with the exceptions of Python and F#, indentation has no effect on semantics in contemporary languages and is therefore ignored by their compilers.
本例中问题的关键在于,else 子句跟在两个 then 子句后面,中间没有 else 子句,而且没有语法指示符来指定 else 子句与其中一个 then 子句的匹配。在 Java 中,如下所示和许多其他命令式语言一样,语言的静态语义指定 else 子句始终与最近的未配对的 then 子句配对。使用静态语义规则而不是语法实体来提供歧义消除。因此,在示例中,else 子句将与第二个 then 子句配对。使用规则而不是某些语法实体的缺点是,尽管程序员可能希望将 else 子句作为第一个 then 子句的替代,并且编译器发现该结构在语法上是正确的,但其语义却相反。为了在 Java 中强制使用替代语义,将 innerif放在复合结构中,如
The crux of the problem in this example is that the else clause follows two then clauses with no intervening else clause, and there is no syntactic indicator to specify a matching of the else clause to one of the then clauses. In Java, as in many other imperative languages, the static semantics of the language specify that the else clause is always paired with the nearest previous unpaired then clause. A static semantics rule, rather than a syntactic entity, is used to provide the disambiguation. So, in the example, the else clause would be paired with the second then clause. The disadvantage of using a rule rather than some syntactic entity is that although the programmer may have meant the else clause to be the alternative to the first then clause and the compiler found the structure syntactically correct, its semantics is the opposite. To force the alternative semantics in Java, the inner if is put in a compound, as in
if (sum == 0) {
if (count == 0)
result = 0;
}
else
result = 1;
if (sum == 0) {
if (count == 0)
result = 0;
}
else
result = 1;
C、C++ 和 C# 存在与 Java 相同的选择语句嵌套问题。由于 Perl 要求所有 then 和 else 子句都是复合的,因此它不需要。在 Perl 中,上述代码应写成如下形式:
C, C++, and C# have the same problem as Java with selection statement nesting. Because Perl requires that all then and else clauses be compound, it does not. In Perl, the previous code would be written as follows:
if (sum == 0) {
if (count == 0) {
result = 0;
}
} else {
result = 1;
}
if (sum == 0) {
if (count == 0) {
result = 0;
}
} else {
result = 1;
}
如果需要替代语义,那么
If the alternative semantics were needed, it would be
if (sum == 0) {
if (count == 0) {
result = 0;
}
else {
result = 1;
}
}if (sum == 0) {
if (count == 0) {
result = 0;
}
else {
result = 1;
}
}
避免嵌套选择语句问题的另一种方法是使用另一种方法来形成复合语句。考虑 Javaif语句的语法结构。then 子句跟在控制表达式后面,else 子句由保留字 引入else。当 then 子句是单个语句并且存在 else 子句时,虽然不需要标记结尾,但else保留字实际上标记了 then 子句的结尾。当 then 子句是复合语句时,它以右括号结尾。但是,如果 中的最后一个子句(if无论是 then 还是 else)不是复合语句,则没有语法实体来标记整个选择语句的结束。为此使用一个特殊的词解决了嵌套选择器的语义问题,也增加了语句的可读性。这是 Ruby 中选择语句的设计。例如,考虑以下 Ruby 语句:
Another way to avoid the issue of nested selection statements is to use an alternative means of forming compound statements. Consider the syntactic structure of the Java if statement. The then clause follows the control expression and the else clause is introduced by the reserved word else. When the then clause is a single statement and the else clause is present, although there is no need to mark the end, the else reserved word in fact marks the end of the then clause. When the then clause is a compound, it is terminated by a right brace. However, if the last clause in an if, whether then or else, is not a compound, there is no syntactic entity to mark the end of the whole selection statement. The use of a special word for this purpose resolves the question of the semantics of nested selectors and also adds to the readability of the statement. This is the design of the selection statement in Ruby. For example, consider the following Ruby statement:
if a > b then sum = sum + a
acount = acount + 1
else sum = sum + b
bcount = bcount + 1
end if a > b then sum = sum + a
acount = acount + 1
else sum = sum + b
bcount = bcount + 1
end
此语句的设计比基于 C 的语言的选择语句的设计更为规则,因为无论 then 和 else 子句中的语句数量有多少,其形式都是相同的。(对于 Perl 也是如此。)回想一下,在 Ruby 中,then 和 else 子句由语句序列而不是复合语句组成。本节开头的选择器示例的第一个解释,其中 else 子句与嵌套的匹配if,可以用 Ruby 编写如下:
The design of this statement is more regular than that of the selection statements of the C-based languages, because the form is the same regardless of the number of statements in the then and else clauses. (This is also true for Perl.) Recall that in Ruby, the then and else clauses consist of statement sequences rather than compound statements. The first interpretation of the selector example at the beginning of this section, in which the else clause is matched to the nested if, can be written in Ruby as follows:
if sum == 0 then
if count == 0 then
result = 0
else
result = 1
end
end if sum == 0 then
if count == 0 then
result = 0
else
result = 1
end
end
因为end保留字关闭了嵌套的if,所以很明显 else 子句与内部的 then 子句匹配。
Because the end reserved word closes the nested if, it is clear that the else clause is matched to the inner then clause.
本节开头的选择语句的第二种解释,其中 else 子句与 outer 匹配if,可以用 Ruby 编写如下:
The second interpretation of the selection statement at the beginning of this section, in which the else clause is matched to the outer if, can be written in Ruby as follows:
if sum == 0 then
if count == 0 then
result = 0
end
else
result = 1
end if sum == 0 then
if count == 0 then
result = 0
end
else
result = 1
end
以下用 Python 编写的语句在语义上等同于上面的最后一个 Ruby 语句:
The following statement, written in Python, is semantically equivalent to the last Ruby statement above:
if sum == 0 :
if count == 0 :
result = 0
else:
result = 1if sum == 0 :
if count == 0 :
result = 0
else:
result = 1
如果该行else:缩进到与嵌套相同的列中if,则 else 子句将与内部子句匹配if。
If the line else: were indented to begin in the same column as the nested if, the else clause would be matched with the inner if.
ML 不存在嵌套选择器的问题,因为它不允许使用 else-lessif语句。
ML does not have a problem with nested selectors because it does not allow else-less if statements.
在函数式语言 ML、F# 和 LISP 中,选择器不是语句;它是一个产生值的表达式。因此,它可以出现在任何其他表达式可以出现的任何位置。请考虑以下用 F# 编写的示例选择器:
In the functional languages ML, F#, and LISP, the selector is not a statement; it is an expression that results in a value. Therefore, it can appear anywhere any other expression can appear. Consider the following example selector written in F#:
let y =
if x > 0 then x
else 2 * x;;let y =
if x > 0 then x
else 2 * x;;
这将创建名称y并将其设置为x或2 * x,具体取决于 是否x大于零。
This creates the name y and sets it to either x or 2 * x, depending on whether x is greater than zero.
在 F# 中,构造的 then 子句返回的值的类型if必须与其 else 子句返回的值的类型相同。如果没有 else 子句,then 子句就不能返回普通类型的值。在这种情况下,它只能返回一个unit类型,这是一种特殊类型,表示没有值。unit类型在代码中表示为()。
In F#, the type of the value returned by the then clause of an if construct must be the same as that of the value returned by its else clause. If there is no else clause, the then clause cannot return a value of a normal type. In this case, it can only return a unit type, which is a special type that means no value. A unit type is represented in code as ().
多选语句允许选择任意数量的语句或语句组中的其中一个。因此,它是选择器的泛化。事实上,可以使用多选器构建双向选择器。
The multiple-selection statement allows the selection of one of any number of statements or statement groups. It is, therefore, a generalization of a selector. In fact, two-way selectors can be built with a multiple selector.
在程序中,经常需要在两条以上的控制路径中进行选择。虽然可以从双向选择器和 goto 构建多路选择器,但由此产生的结构很麻烦、不可靠,并且难以编写和阅读。因此,显然需要一种特殊的结构。
The need to choose from among more than two control paths in programs is common. Although a multiple selector can be built from two-way selectors and gotos, the resulting structures are cumbersome, unreliable, and difficult to write and read. Therefore, the need for a special structure is clear.
多重选择器的一些设计问题与双向选择器的一些设计问题类似。例如,一个问题是选择器所基于的表达式类型问题。在这种情况下,可能性范围更大,部分原因是可能的选择数量更大。双向选择器需要一个只有两个可能值的表达式。另一个问题是是否可以选择单个语句、复合语句或语句序列。接下来,还有一个问题,即执行语句时是否只能执行单个可选段。这不是双向选择器的问题,因为它们在一次执行期间始终只允许其中一个子句位于控制路径上。我们将看到,解决多重选择器这个问题的方法是权衡可靠性和灵活性。另一个问题是案例值规范的形式。最后,还有一个问题,即如果选择器表达式求值后的值未选择其中一个段,结果会是什么。(这样的值在可选段中不会出现。)这里的选择是简单地禁止这种情况发生,还是让语句在这种情况发生时什么也不做。
Some of the design issues for multiple selectors are similar to some of those for two-way selectors. For example, one issue is the question of the type of expression on which the selector is based. In this case, the range of possibilities is larger, in part because the number of possible selections is larger. A two-way selector needs an expression with only two possible values. Another issue is whether single statements, compound statements, or statement sequences may be selected. Next, there is the question of whether only a single selectable segment can be executed when the statement is executed. This is not an issue for two-way selectors, because they always allow only one of the clauses to be on a control path during one execution. As we shall see, the resolution of this issue for multiple selectors is a trade-off between reliability and flexibility. Another issue is the form of the case value specifications. Finally, there is the issue of what should result from the selector expression evaluating to a value that does not select one of the segments. (Such a value would be unrepresented among the selectable segments.) The choice here is between simply disallowing the situation from arising and having the statement do nothing at all when it does arise.
以下是这些设计问题的总结:
The following is a summary of these design issues:
控制选择的表达式的形式和类型是什么?
What is the form and type of the expression that controls the selection?
如何指定可选择的段?
How are the selectable segments specified?
通过结构的执行流程是否仅限于包含单个可选段?
Is execution flow through the structure restricted to include just a single selectable segment?
如何指定案例值?
How are the case values specified?
如果需要的话,应该如何处理未表示的选择器表达式值?
How should unrepresented selector expression values be handled, if at all?
C 语言中的多选择器语句switch也是 C++、Java 和 JavaScript 的一部分,是一种相对原始的设计。它的一般形式是
The C multiple-selector statement, switch, which is also part of C++, Java, and JavaScript, is a relatively primitive design. Its general form is
switch (表达) {
case
:
;
. . .
case
:
;
[default: ]}
switch (expression) {
case
:
;
. . .
case
:
;
[default: ]}
其中控制表达式和常量表达式是某种离散类型。这包括整数类型以及字符和枚举类型。可选语句可以是语句序列、复合语句或块。可选段default用于控制表达式的未表示值。如果控制表达式的值未表示且不存在默认段,则该语句不执行任何操作。
where the control expression and the constant expressions are some discrete type. This includes integer types, as well as characters and enumeration types. The selectable statements can be statement sequences, compound statements, or blocks. The optional default segment is for unrepresented values of the control expression. If the value of the control expression is not represented and no default segment is present, then the statement does nothing.
该switch语句不在其代码段末尾提供隐式分支。这允许控制在一次执行中流经多个可选代码段。请考虑以下示例:
The switch statement does not provide implicit branches at the end of its code segments. This allows control to flow through more than one selectable code segment on a single execution. Consider the following example:
switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
case 2:
case 4: even += 1;
sumeven += index;
default: printf("Error in switch, index = %d\n", index);
}switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
case 2:
case 4: even += 1;
sumeven += index;
default: printf("Error in switch, index = %d\n", index);
}
此代码在每次执行时都会打印错误消息。同样,每次执行或常量处的代码时,都会执行2和常量的代码。为了在逻辑上分隔这些段,必须包含显式分支。语句实际上是受限制的 goto,通常用于退出语句。将控制权转移到它出现的复合语句之后的第一个语句。413breakswitchbreak
This code prints the error message on every execution. Likewise, the code for the 2 and 4 constants is executed every time the code at the 1 or 3 constants is executed. To separate these segments logically, an explicit branch must be included. The break statement, which is actually a restricted goto, is normally used for exiting switch statements. break transfers control to the first statement after the compound statement in which it appears.
以下switch语句用于break将每次执行限制到单个可选段:
The following switch statement uses break to restrict each execution to a single selectable segment:
switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
break;
case 2:
case 4: even += 1;
sumeven += index;
break;
default: printf("Error in switch, index = %d\n", index);
}switch (index) {
case 1:
case 3: odd += 1;
sumodd += index;
break;
case 2:
case 4: even += 1;
sumeven += index;
break;
default: printf("Error in switch, index = %d\n", index);
}
有时,让控制从一个可选代码段流向另一个代码段会很方便。例如,在上面的例子中,case 值和的段为1空2,从而允许控制分别流向3和的段4。这就是语句中没有隐式分支的原因switch。当错误地break在段中缺少语句而导致控制错误地流向下一个段时,就会出现这种设计的可靠性问题。C 的设计者用switch可靠性的降低来换取灵活性的提高。然而研究表明,让控制从一个可选段流向另一个可选段的能力很少使用。Cswitch以 ALGOL 68 中的多选语句为模型,该语句也没有从可选段的隐式分支。
Occasionally, it is convenient to allow control to flow from one selectable code segment to another. For example, in the example above, the segments for the case values 1 and 2 are empty, allowing control to flow to the segments for 3 and 4, respectively. This is the reason why there are no implicit branches in the switch statement. The reliability problem with this design arises when the mistaken absence of a break statement in a segment allows control to flow to the next segment incorrectly. The designers of C’s switch traded a decrease in reliability for an increase in flexibility. Studies have shown, however, that the ability to have control flow from one selectable segment to another is rarely used. C’s switch is modeled on the multiple-selection statement in ALGOL 68, which also does not have implicit branches from selectable segments.
C switch 语句对 case 表达式的放置几乎没有任何限制,它们被视为普通语句标签。这种松懈可能导致 switch 主体内的结构非常复杂。以下示例取自Harbison 和 Steele (2002)。
The C switch statement has virtually no restrictions on the placement of the case expressions, which are treated as if they were normal statement labels. This laxness can result in highly complex structure within the switch body. The following example is taken from Harbison and Steele (2002).
switch (x)
default:
if (prime(x))
case 2: case 3: case 5: case 7:
process_prime(x);
else
case 4: case 6: case 8: case 9: case 10:
process_composite(x);switch (x)
default:
if (prime(x))
case 2: case 3: case 5: case 7:
process_prime(x);
else
case 4: case 6: case 8: case 9: case 10:
process_composite(x);
该代码可能看起来具有极其复杂的形式,但它是为解决实际问题而设计的,并且可以正确有效地解决该问题。2
This code may appear to have diabolically complex form, but it was designed for a real problem and works correctly and efficiently to solve that problem.2
Java switch 通过禁止 case 表达式出现在 switch 主体顶层以外的任何地方来防止这种复杂性。
The Java switch prevents this sort of complexity by disallowing case expressions from appearing anywhere except the top level of the body of the switch.
C# switch 语句与基于 C 的前辈有两点不同。首先,C# 有一个静态语义规则,不允许隐式执行多个段。该规则是每个可选段都必须以显式无条件分支语句结尾:要么是break,它将控制权转移出switch语句,要么是 goto ,它可以将控制权转移到其中一个可选段(或几乎任何其他地方)。例如,
The C# switch statement differs from that of its C-based predecessors in two ways. First, C# has a static semantics rule that disallows the implicit execution of more than one segment. The rule is that every selectable segment must end with an explicit unconditional branch statement: either a break, which transfers control out of the switch statement, or a goto, which can transfer control to one of the selectable segments (or virtually anywhere else). For example,
switch (value) {
case -1:
Negatives++;
break;
case 0:
Zeros++;
goto case 1;
case 1:
Positives++;
default:
Console.WriteLine("Error in switch \n");
}switch (value) {
case -1:
Negatives++;
break;
case 0:
Zeros++;
goto case 1;
case 1:
Positives++;
default:
Console.WriteLine("Error in switch \n");
}
注意这Console.WriteLine是在 C# 中显示字符串的方法。
Note that Console.WriteLine is the method for displaying strings in C#.
C# 与其switch前身的另一个不同之处在于,C# 中的控制表达式和 case 语句可以是字符串。
The other way C#’s switch differs from that of its predecessors is that the control expression and the case statements can be strings in C#.
PHPswitch使用 C 的语法,但允许更多的类型灵活性。case 值可以是任何 PHP 标量类型 - 字符串、整数或双精度。与 C 一样,如果所选段末尾switch没有,则继续执行下一个段。break
PHP’s switch uses the syntax of C’s switch but allows more type flexibility. The case values can be any of the PHP scalar types—string, integer, or double precision. As with C, if there is no break at the end of the selected segment, execution continues into the next segment.
Ruby 有两种形式的多选构造,它们都称为case 表达式,并且都产生最后一个表达式的值。这里描述的唯一版本的 Ruby case 表达式在语义上类似于嵌套 if 语句的列表:
Ruby has two forms of multiple-selection constructs, both of which are called case expressions and both of which yield the value of the last expression evaluated. The only version of Ruby’s case expressions that is described here is semantically similar to a list of nested if statements:
case
when Boolean_expression then expression
. . .
when Boolean_expression then expression
[else expression]
end
case
when Boolean_expression then expression
. . .
when Boolean_expression then expression
[else expression]
end
此 case 表达式的语义是,布尔表达式从上到下依次求值。case 表达式的值是第一个布尔表达式为真的 then 表达式的值。此语句中的 else 表示真,else 子句是可选的。例如,3
The semantics of this case expression is that the Boolean expressions are evaluated one at a time, top to bottom. The value of the case expression is the value of the first then expression whose Boolean expression is true. The else represents true in this statement, and the else clause is optional. For example,3
leap = case
when year % 400 == 0 then true
when year % 100 == 0 then false
else year % 4 == 0
end leap = case
when year % 400 == 0 then true
when year % 100 == 0 then false
else year % 4 == 0
end
year如果是闰年,则此 case 表达式计算结果为真。
This case expression evaluates to true if year is a leap year.
Ruby 中的其他 case 表达式形式与 Java 中的 switch 类似。Perl 和 Python 没有多选语句。
The other Ruby case expression form is similar to the switch of Java. Perl and Python do not have multiple-selection statements.
多选语句本质上是指向代码段的 n 路分支,其中 n 是可选段的数量。实现这样的语句必须使用多个条件分支指令。再次考虑 C switch 语句的一般形式,带有中断:
A multiple selection statement is essentially an n-way branch to segments of code, where n is the number of selectable segments. Implementing such a statement must be done with multiple conditional branch instructions. Consider again the general form of the C switch statement, with breaks:
switch (表达) {
case
:
;
break;
. . .
case
:
;
break;
[default:
]
}
switch (expression) {
case
:
;
break;
. . .
case
:
;
break;
[default:
]
}
这句话的简单翻译如下:
One simple translation of this statement follows:
将表达式评估到 t
goto分支的代码
代碼
出去
……
代碼
转到默认
:代码
转到分支
:如果t=
转到
...
如果=
转到
转到默认
输出:
Code to evaluate expression into t
goto branches
code for
goto out
. . .
code for
goto out
default: code for
goto out
branches: if t =
goto
. . .
if t =
goto
goto default
out:
可选段的代码位于分支之前,因此在生成分支时,分支的目标都是已知的。这些编码条件分支的替代方法是将案例值和标签放在表中,并使用带循环的线性搜索来查找正确的标签。这比编码条件需要的空间更少。
The code for the selectable segments precedes the branches so that the targets of the branches are all known when the branches are generated. An alternative to these coded conditional branches is to put the case values and labels in a table and use a linear search with a loop to find the correct label. This requires less space than the coded conditionals.
使用条件分支或对案例和标签表进行线性搜索是一种简单但低效的方法,当案例数量较少(例如少于 10 个)时可以接受。平均需要进行大约一半案例数量的测试才能找到正确的案例。要选择默认案例,必须测试所有其他案例。在具有 10 个或更多案例的语句中,这种形式的低效率与其简单性不符。
The use of conditional branches or a linear search on a table of cases and labels is a simple but inefficient approach that is acceptable when the number of cases is small, say less than 10. It takes an average of about half as many tests as there are cases to find the right one. For the default case to be chosen, all other cases must be tested. In statements with 10 or more cases, the low efficiency of this form is not justified by its simplicity.
当案例数量为 10 或更多时,编译器可以构建段标签的哈希表,这将导致选择任何可选段的时间大致相等(且较短)。如果语言允许案例表达式的值范围,如 Ruby,则哈希方法不合适。对于这些情况,案例值和段地址的二进制搜索表更好。
When the number of cases is 10 or greater, the compiler can build a hash table of the segment labels, which would result in approximately equal (and short) times to choose any of the selectable segments. If the language allows ranges of values for case expressions, as in Ruby, the hash method is not suitable. For these situations, a binary search table of case values and segment addresses is better.
如果案例值的范围相对较小,并且表示了整个值范围的一半以上,则可以构建一个数组,其索引是案例值,其值是段标签。索引不在所表示的案例值中的数组元素将用默认段标签填充。然后通过数组索引找到正确的段标签,这非常快。
If the range of the case values is relatively small and more than half of the whole range of values is represented, an array whose indices are the case values and whose values are the segment labels can be built. Array elements whose indices are not among the represented case values are filled with the default segment’s label. Then finding the correct segment label is found by array indexing, which is very fast.
当然,在这些方法中进行选择会给编译器带来额外的负担。在许多编译器中,只使用两种不同的方法。与其他情况一样,确定和使用最有效的方法会花费更多的编译器时间。
Of course, choosing among these approaches is an additional burden on the compiler. In many compilers, only two different methods are used. As in other situations, determining and using the most efficient method costs more compiler time.
ifif在许多情况下,switch或case语句不足以进行多选(Ruby 的case例外)。例如,当必须基于布尔表达式而不是某些序数类型进行选择时,可以使用嵌套的双向选择器来模拟多重选择器。为了缓解深度嵌套的双向选择器的可读性差的问题,某些语言(如 Perl 和 Python)已专门为此用途进行了扩展。扩展允许省略一些特殊词。具体来说,else-if 序列被替换为单个特殊词,并且嵌套的结束特殊词被if删除。嵌套的选择器被称为else-if 子句。考虑以下 Python 选择器语句(请注意,else-ifelif在 Python 中的拼写为):
In many situations, a switch or case statement is inadequate for multiple selection (Ruby’s case is an exception). For example, when selections must be made on the basis of a Boolean expression rather than some ordinal type, nested two-way selectors can be used to simulate a multiple selector. To alleviate the poor readability of deeply nested two-way selectors, some languages, such as Perl and Python, have been extended specifically for this use. The extension allows some of the special words to be left out. In particular, else-if sequences are replaced with a single special word, and the closing special word on the nested if is dropped. The nested selector is then called an else-if clause. Consider the following Python selector statement (note that else-if is spelled elif in Python):
if count < 10 :
bag1 = True
elif count < 100 :
bag2 = True
elif count < 1000 :
bag3 = True
if count < 10 :
bag1 = True
elif count < 100 :
bag2 = True
elif count < 1000 :
bag3 = True
这相当于以下内容:
which is equivalent to the following:
if count < 10 :
bag1 = True
else :
if count < 100 :
bag2 = True
else :
if count < 1000 :
bag3 = True
else :
bag4 = True
if count < 10 :
bag1 = True
else :
if count < 100 :
bag2 = True
else :
if count < 1000 :
bag3 = True
else :
bag4 = True
else-if 版本(第一个)是两者中可读性更强的。请注意,此示例不易用语句模拟switch,因为每个可选语句都是根据布尔表达式选择的。因此,else-if 语句不是 的冗余形式switch。事实上,当代语言中的多重选择器都没有 if-then-else-if 语句那么通用。下面给出了带有 else-if 子句的通用选择器语句的操作语义描述,其中 E 是逻辑表达式,S 是语句:
The else-if version (the first) is the more readable of the two. Notice that this example is not easily simulated with a switch statement, because each selectable statement is chosen on the basis of a Boolean expression. Therefore, the else-if statement is not a redundant form of switch. In fact, none of the multiple selectors in contemporary languages are as general as the if-then-else-if statement. An operational semantics description of a general selector statement with else-if clauses, in which the E’s are logic expressions and the S’s are statements, is given here:
如果
如果
转到2
...
1:
出去
if
goto 1
if
goto 2
. . .
1:
goto out
2:
转到出去
...
出去:...
2:
goto out
. . .
out: . . .
从这个描述中,我们可以看到多重选择结构和 else-if 语句之间的区别:在多重选择语句中,所有的 E 都被限制在单个表达式的值与一些其他值之间的比较。
From this description, we can see the difference between multiple selection structures and else-if statements: In a multiple selection statement, all the E’s would be restricted to comparisons between the value of a single expression and some other values.
不包含 else-if 语句的语言可以使用相同的控制结构,只需稍微多输入一些内容。
Languages that do not include the else-if statement can use the same control structure, with only slightly more typing.
上面的 Python 示例 if-then-else-if 语句可以写成 Rubycase语句:
The Python example if-then-else-if statement above can be written as the Ruby case statement:
case
when count < 10 then bag1 = true
when count < 100 then bag2 = true
when count < 1000 then bag3 = true
end
case
when count < 10 then bag1 = true
when count < 100 then bag2 = true
when count < 1000 then bag3 = true
end
Else-if 语句基于常见的数学语句,即条件表达式。
Else-if statements are based on the common mathematics statement, the conditional expression.
Scheme 多重选择器基于数学条件表达式,是一种名为 的特殊形式函数COND。COND它是数学条件表达式的略微泛化版本;它允许多个谓词同时为真。由于不同的数学条件表达式具有不同数量的参数,因此COND不需要固定数量的实际参数。每个参数COND都是一对表达式,其中第一个是谓词(其计算结果为#T或#F)。
The Scheme multiple selector, which is based on mathematical conditional expressions, is a special form function named COND. COND is a slightly generalized version of the mathematical conditional expression; it allows more than one predicate to be true at the same time. Because different mathematical conditional expressions have different numbers of parameters, COND does not require a fixed number of actual parameters. Each parameter to COND is a pair of expressions in which the first is a predicate (it evaluates to either #T or #F).
的一般形式COND是
The general form of COND is
(COND
(
)
(
)
...
(
)
[(ELSE
)])
(COND
(
)
(
)
. . .
(
)
[(ELSE
)])
其中该ELSE子句是可选的。
where the ELSE clause is optional.
的语义COND如下:依次从第一个开始依次评估参数的谓词,直到其中一个评估为。然后评估#T第一个谓词之后的表达式,并将其值作为 的值返回。如果没有谓词为真,并且有,则评估其表达式并返回 值。如果没有谓词为真,并且没有, 的值未指定。因此,所有都应包含。#TCONDELSEELSECONDCONDELSE
The semantics of COND is as follows: The predicates of the parameters are evaluated one at a time, in order from the first, until one evaluates to #T. The expression that follows the first predicate that is found to be #T is then evaluated and its value is returned as the value of COND. If none of the predicates is true and there is an ELSE, its expression is evaluated and the value is returned. If none of the predicates is true and there is no ELSE, the value of COND is unspecified. Therefore, all CONDs should include an ELSE.
请考虑以下示例调用COND:
Consider the following example call to COND:
(COND
((> x y) "x is greater than y")
((< x y) "y is greater than x")
(ELSE "x and y are equal")
) (COND
((> x y) "x is greater than y")
((< x y) "y is greater than x")
(ELSE "x and y are equal")
)
请注意,字符串文字对其自身进行求值,因此当对此调用COND进行求值时,它会产生一个字符串结果。
Note that string literals evaluate to themselves, so that when this call to COND is evaluated, it produces a string result.
迭代语句是使语句或语句集合执行零次、一次或多次的语句。迭代语句通常称为循环。从 Plankalkül 开始,每种编程语言都包含某种重复执行代码段的方法。迭代是计算机能力的本质。如果没有某种重复执行语句或语句集合的方法,程序员就需要按顺序陈述每个操作;有用的程序会变得庞大而缺乏灵活性,编写起来会花费大量时间,存储起来也会占用大量内存。
An iterative statement is one that causes a statement or collection of statements to be executed zero, one, or more times. An iterative statement is often called a loop. Every programming language from Plankalkül on has included some method of repeating the execution of segments of code. Iteration is the very essence of the power of the computer. If some means of repetitive execution of a statement or collection of statements were not possible, programmers would be required to state every action in sequence; useful programs would be huge and inflexible and take unacceptably large amounts of time to write and mammoth amounts of memory to store.
编程语言中的第一个迭代语句与数组直接相关。这是因为在计算机的早期,计算主要是数值计算,经常使用循环来处理数组中的数据。
The first iterative statements in programming languages were directly related to arrays. This resulted from the fact that in the earliest years of computers, computing was largely numerical in nature, frequently using loops to process data in arrays.
目前已经开发出了几类迭代控制语句。主要类别是根据设计师如何回答两个基本设计问题来定义的:
Several categories of iteration control statements have been developed. The primary categories are defined by how designers answered two basic design questions:
迭代是如何控制的?
How is the iteration controlled?
控制机制应该出现在循环语句的什么位置?
Where should the control mechanism appear in the loop statement?
迭代控制的主要可能性是逻辑、计数或两者的组合。控制机制位置的主要选择是循环顶部或循环底部。这里的顶部和底部是逻辑上的,而不是物理上的。问题不在于控制机制的物理位置;而是机制是在语句主体执行之前还是之后执行并影响控制。第 8.3.3 节讨论了第三个选项,它允许用户决定将控件放在何处,在顶部、底部,甚至在受控段内 。
The primary possibilities for iteration control are logical, counting, or a combination of the two. The main choices for the location of the control mechanism are the top of the loop or the bottom of the loop. Top and bottom here are logical, rather than physical, denotations. The issue is not the physical placement of the control mechanism; rather, it is whether the mechanism is executed and affects control before or after execution of the statement’s body. A third option, which allows the user to decide where to put the control, at the top, at the bottom, or even within the controlled segment, is discussed in Section 8.3.3.
迭代语句的主体是语句的集合,其执行由迭代语句控制。我们使用术语“预测试”来表示在执行循环体之前进行的循环完成测试,使用术语“后测试”来表示在执行循环体之后进行的循环完成测试。迭代语句和相关循环体一起构成迭代语句。
The body of an iterative statement is the collection of statements whose execution is controlled by the iteration statement. We use the term pretest to mean that the test for loop completion occurs before the loop body is executed and posttest to mean that it occurs after the loop body is executed. The iteration statement and the associated loop body together form an iteration statement.
计数迭代控制语句有一个变量,称为循环变量,计数值保存在该变量中。它还包括一些指定循环变量的初始值和终止值的方法,以及顺序循环变量值之间的差异,通常称为步长。循环的初始值、终止值和步长规范称为循环参数。
A counting iterative control statement has a variable, called the loop variable, in which the count value is maintained. It also includes some means of specifying the initial and terminal values of the loop variable, and the difference between sequential loop variable values, often called the stepsize. The initial, terminal, and stepsize specifications of a loop are called the loop parameters.
虽然逻辑控制循环比计数器控制循环更通用,但它们并不一定更常用。由于计数器控制循环更复杂,因此其设计要求更高。
Although logically controlled loops are more general than counter-controlled loops, they are not necessarily more commonly used. Because counter-controlled loops are more complex, their design is more demanding.
计数器控制循环有时由为此目的而设计的机器指令支持。不幸的是,机器架构的寿命可能比架构设计时流行的编程方法更长。例如,VAX 计算机有一个非常方便的指令,用于实现后测试计数器控制循环,Fortran 在设计 VAX 时(20 世纪 70 年代中期)就有这种指令。但是,当 VAX 计算机得到广泛使用时,Fortran 不再有这样的循环(它已被预测试循环取代)。此外,当时没有其他广泛使用的语言具有后测试计数循环。
Counter-controlled loops are sometimes supported by machine instructions designed for that purpose. Unfortunately, machine architecture can outlive the prevailing approaches to programming at the time of the architecture design. For example, VAX computers have a very convenient instruction for the implementation of posttest counter-controlled loops, which Fortran had at the time of the design of the VAX (mid-1970s). But Fortran no longer had such a loop by the time VAX computers became widely used (it had been replaced by a pretest loop). Furthermore, no other widely used language of the time had a posttest counting loop.
迭代计数器控制语句存在许多设计问题。循环变量和循环参数的性质提供了许多设计问题。循环变量的类型和循环参数的类型显然应该相同或至少兼容,但应该允许哪些类型?一个明显的选择是整数,但枚举、字符和浮点类型呢?另一个问题是,就范围而言,循环变量是否是普通变量,或者它是否应该具有某些特殊范围。允许用户在循环内更改循环变量或循环参数会导致代码非常难以理解,因此另一个问题是,允许此类更改可能获得的额外灵活性是否值得增加复杂性。关于循环参数的求值次数和具体时间也出现了类似的问题:如果只求值一次,循环很简单,但灵活性较差。最后,如果循环变量的范围超出循环,则循环终止后它的值是多少?
There are many design issues for iterative counter-controlled statements. The nature of the loop variable and the loop parameters provide a number of design issues. The type of the loop variable and that of the loop parameters obviously should be the same or at least compatible, but what types should be allowed? One apparent choice is integer, but what about enumeration, character, and floating-point types? Another question is whether the loop variable is a normal variable, in terms of scope, or whether it should have some special scope. Allowing the user to change the loop variable or the loop parameters within the loop can lead to code that is very difficult to understand, so another question is whether the additional flexibility that might be gained by allowing such changes is worth that additional complexity. A similar question arises about the number of times and the specific time when the loop parameters are evaluated: If they are evaluated just once, loops are simple but less flexible. Finally, what is the value of the loop variable after loop termination, if its scope extends beyond the loop?
以下是这些设计问题的总结:
The following is a summary of these design issues:
循环变量的类型和范围是什么?
What are the type and scope of the loop variable?
在循环中改变循环变量或者循环参数是否合法,如果合法,那么这种改变是否影响循环控制?
Should it be legal for the loop variable or loop parameters to be changed in the loop, and if so, does the change affect loop control?
循环参数是否应该只评估一次,或者每次迭代评估一次?
Should the loop parameters be evaluated only once, or once for every iteration?
循环终止后,循环变量的值是多少?
What is the value of the loop variable after loop termination?
某些语言(如 Fortran 90)通过在循环终止后使循环变量未定义来解决循环终止后循环变量的值问题。其他语言(如 Ada)将循环变量的范围设为循环本身。
The issue of the value of the loop variable after loop termination is solved in some languages, such as Fortran 90, by making the loop variable undefined after loop termination. Other languages, such as Ada, make the scope of the loop variable the loop itself.
for基于 C 语言的声明for Statement of the C-Based LanguagesC 语句的一般形式for为
The general form of C’s for statement is
for (表达式_1;表达式_2;表达式_3)
循环体
for (expression_1; expression_2; expression_3)
loop body
循环体可以是单个语句、复合语句或空语句。
The loop body can be a single statement, a compound statement, or a null statement.
由于 C 中的赋值语句会产生结果,因此可以将其视为表达式,因此语句中的表达式for通常都是赋值语句。第一个表达式用于初始化,仅在for语句执行开始时求值一次。第二个表达式是循环控制,在每次执行循环体之前求值。与 C 中的情况一样,零值表示假,所有非零值表示真。因此,如果第二个表达式的值为零,则for终止;否则,执行循环体语句。在 C99 中,表达式也可以是布尔类型。C99 布尔类型只存储值 0 或 1。最后一个表达式在for每次执行循环体后执行。它通常用于增加循环计数器。for下面显示了 C 语句的操作语义描述。由于 C 表达式可以用作语句,因此表达式求值显示为语句。
Because assignment statements in C produce results and thus can be considered expressions, the expressions in a for statement are often assignment statements. The first expression is for initialization and is evaluated only once, when the for statement execution begins. The second expression is the loop control and is evaluated before each execution of the loop body. As is usual in C, a zero value means false and all nonzero values mean true. Therefore, if the value of the second expression is zero, the for is terminated; otherwise, the loop body statements are executed. In C99, the expression also could be a Boolean type. A C99 Boolean type stores only the values 0 or 1. The last expression in the for is executed after each execution of the loop body. It is often used to increment the loop counter. An operational semantics description of the C for statement is shown next. Because C expressions can be used as statements, expression evaluations are shown as statements.
表达式_1循环
:
如果表达式_2 =0转到
[循环体]
表达式_3
转到循环
出口:. . .
expression_1
loop:
if expression_2 = 0 goto out
[loop body]
expression_3
goto loop
out: . . .
以下是 C 语言主语句的示例for:
Following is an example of a skeletal C for statement:
for (count = 1; count <= 10; count++)
. . .
}for (count = 1; count <= 10; count++)
. . .
}
C 的所有表达式for都是可选的。如果第二个表达式不存在,则认为是真的,因此for没有第二个表达式的表达式可能是一个无限循环。如果第一个和/或第三个表达式不存在,则不做任何假设。例如,如果第一个表达式不存在,则仅表示没有进行初始化。
All of the expressions of C’s for are optional. An absent second expression is considered true, so a for without one is potentially an infinite loop. If the first and/or third expressions are absent, no assumptions are made. For example, if the first expression is absent, it simply means that no initialization takes place.
请注意,Cfor不需要计数。它可以轻松模拟计数和逻辑循环结构,如下一节所示。
Note that C’s for need not count. It can easily model counting and logical loop structures, as demonstrated in the next section.
Cfor设计选择如下:没有显式循环变量,也没有循环参数。所有涉及的变量都可以在循环体中更改。表达式按先前规定的顺序进行求值。虽然这可能会造成混乱,但分支到 Cfor循环体是合法的。
The C for design choices are the following: There is no explicit loop variable and no loop parameters. All involved variables can be changed in the loop body. The expressions are evaluated in the order stated previously. Although it can create havoc, it is legal to branch into a C for loop body.
Cfor是最灵活的语言之一,因为每个表达式都可以包含多个表达式,从而允许多个可以是任何类型的循环变量。当在语句的单个表达式中使用多个表达式时for,它们以逗号分隔。所有 C 语句都有值,这种形式的多重表达式也不例外。这种多重表达式的值是最后一个组件的值。
C’s for is one of the most flexible, because each of the expressions can comprise multiple expressions, which in turn allow multiple loop variables that can be of any type. When multiple expressions are used in a single expression of a for statement, they are separated by commas. All C statements have values, and this form of multiple expression is no exception. The value of such a multiple expression is the value of the last component.
考虑以下for语句:
Consider the following for statement:
for (count1 = 0, count2 = 1.0;
count1 <= 10 && count2 <= 100.0;
sum = ++count1 + count2, count2 *= 2.5);for (count1 = 0, count2 = 1.0;
count1 <= 10 && count2 <= 100.0;
sum = ++count1 + count2, count2 *= 2.5);
对此的操作语义描述是
The operational semantics description of this is
count1 =0
count2 =1.0
循环:
如果count1 >10 则转到输出
如果count2 >100.0则转到输出
count1 =count1 +1
sum =count1 +count2
count2 = count2 * 2.5
goto循环
退出:...
count1 = 0
count2 = 1.0
loop:
if count1 > 10 goto out
if count2 > 100.0 goto out
count1 = count1 + 1
sum = count1 + count2
count2 = count2 * 2.5
goto loop
out: . . .
示例 Cfor语句不需要循环体,因此也没有循环体。所有所需的操作都是for语句本身的一部分,而不是语句体。第一个和第三个表达式是多个语句。在这两种情况下,都会对整个表达式进行求值,但结果值不会用于循环控制。
The example C for statement does not need and thus does not have a loop body. All the desired actions are part of the for statement itself, rather than in its body. The first and third expressions are multiple statements. In both of these cases, the whole expression is evaluated, but the resulting value is not used in the loop control.
C99 和 C++ 的语句for与早期版本的 C 的语句有两点不同。首先,除了算术表达式之外,它还可以使用布尔表达式进行循环控制。其次,第一个表达式可以包含变量定义。例如,
The for statement of C99 and C++ differs from that of earlier versions of C in two ways. First, in addition to an arithmetic expression, it can use a Boolean expression for loop control. Second, the first expression can include variable definitions. For example,
for (int count = 0; count < len; count++) { . . . }for (int count = 0; count < len; count++) { . . . }
语句中定义的变量的作用域for是从其定义处到循环体结束。
The scope of a variable defined in the for statement is from its definition to the end of the loop body.
Java和C#的语句for和C++的语句类似,只不过循环控制表达式被限制为boolean。
The for statement of Java and C# is like that of C++, except that the loop control expression is restricted to boolean.
在所有基于 C 的语言中,每次迭代都会评估最后两个循环参数。此外,循环参数表达式中出现的变量可以在循环体中更改。因此,这些循环可能很复杂,并且可能不可靠。
In all of the C-based languages, the last two loop parameters are evaluated with every iteration. Furthermore, variables that appear in the loop parameter expression can be changed in the loop body. Therefore, these loops can be complex and potentially unreliable.
for Python的语句for Statement of PythonPython 的一般形式for是
The general form of Python’s for is
forloop_variablein对象:
- loop body
[ else:
- else 子句]
for loop_variable in object:
- loop body
[else:
- else clause]
循环变量被赋予对象中的值,该值通常是一个范围,每次执行循环体时都会赋值。循环终止后,循环变量将具有最后赋给它的值。循环变量可以在循环体中更改,但这种更改不会影响循环操作。如果存在 else 子句,则在循环正常终止时执行。
The loop variable is assigned the value in the object, which is often a range, one for each execution of the loop body. After loop termination, the loop variable has the value last assigned to it. The loop variable can be changed in the loop body, but such a change has no effect on loop operation. The else clause, when present, is executed if the loop terminates normally.
请考虑以下示例:
Consider the following example:
for count in [2, 4, 6]:
print countfor count in [2, 4, 6]:
print count
生产
produces
2
4
6
2
4
6
对于 Python 中大多数简单的计数循环,range使用 函数。range接受一个、两个或三个参数。 以下示例演示了 的操作range:
For most simple counting loops in Python, the range function is used. range takes one, two, or three parameters. The following examples demonstrate the actions of range:
range(5) returns [0, 1, 2, 3, 4]
range(2, 7) returns [2, 3, 4, 5, 6]
range(0, 8, 2) returns [0, 2, 4, 6]
range(5) returns [0, 1, 2, 3, 4]
range(2, 7) returns [2, 3, 4, 5, 6]
range(0, 8, 2) returns [0, 2, 4, 6]
请注意,range永远不会返回给定参数范围内的最高值。
Note that range never returns the highest value in a given parameter range.
命令式语言中的计数器控制循环使用计数器变量,但纯函数式语言中不存在此类变量。函数式语言使用递归而不是迭代来控制重复。函数式语言使用递归函数而不是语句。计数循环可以在函数式语言中模拟如下:计数器可以是重复执行循环体的函数的参数,该函数可以在作为参数发送给循环函数的第二个函数中指定。因此,这样的循环函数将主体函数和重复次数作为参数。
Counter-controlled loops in imperative languages use a counter variable, but such variables do not exist in pure functional languages. Rather than iteration to control repetition, functional languages use recursion. Rather than a statement, functional languages use a recursive function. Counting loops can be simulated in functional languages as follows: The counter can be a parameter for a function that repeatedly executes the loop body, which can be specified in a second function sent to the loop function as a parameter. So, such a loop function takes the body function and the number of repetitions as parameters.
用于模拟计数循环的 F# 函数的一般形式(forLoop在本例中命名)如下:
The general form of an F# function for simulating counting loops, named forLoop in this case, is as follows:
let rec forLoop loopBody reps =
if reps <= 0 then
()
else
loopBody()
forLoop loopBody, (reps - 1);;let rec forLoop loopBody reps =
if reps <= 0 then
()
else
loopBody()
forLoop loopBody, (reps - 1);;
在此函数中,参数loopBody是包含循环体的函数,参数reps是重复次数。保留字rec出现在函数名称之前,表示它是递归的。空括号不执行任何操作;它们之所以存在,是因为在 F# 中,空语句是非法的,并且每个语句都if必须有一个 else 子句。
In this function, the parameter loopBody is the function with the body of the loop and the parameter reps is the number of repetitions. The reserved word rec appears before the name of the function to indicate that it is recursive. The empty parentheses do nothing; they are there because in F# an empty statement is illegal and every if must have an else clause.
在许多情况下,语句集合必须重复执行,但重复控制基于布尔表达式而不是计数器。对于这些情况,逻辑控制循环很方便。实际上,逻辑控制循环比计数器控制循环更通用。每个计数循环都可以用逻辑循环构建,但反之则不然。另外,回想一下,只有选择和逻辑循环对于表达任何流程图的控制结构是必不可少的。
In many cases, collections of statements must be repeatedly executed, but the repetition control is based on a Boolean expression rather than a counter. For these situations, a logically controlled loop is convenient. Actually, logically controlled loops are more general than counter-controlled loops. Every counting loop can be built with a logical loop, but the reverse is not true. Also, recall that only selection and logical loops are essential to express the control structure of any flowchart.
由于逻辑控制循环比计数器控制循环简单得多,因此其设计问题较少。
Because they are much simpler than counter-controlled loops, logically controlled loops have fewer design issues.
控制应该是前测还是后测?
Should the control be pretest or posttest?
逻辑控制循环应该是计数循环的特殊形式还是单独的语句?
Should the logically controlled loop be a special form of a counting loop or a separate statement?
基于 C 的编程语言包括预测试和后测试逻辑控制循环,它们不是计数器控制迭代语句的特殊形式。预测试和后测试逻辑循环具有以下形式:
The C-based programming languages include both pretest and posttest logically controlled loops that are not special forms of their counter-controlled iterative statements. The pretest and posttest logical loops have the following forms:
while (control_expression)
循环体
while (control_expression)
loop body
和
and
do
循环体while (控制表达式);
do
loop bodywhile (control_expression);
这两种语句形式通过以下 C# 代码段示例:
These two statement forms are exemplified by the following C# code segments:
sum = 0;
indat = Int32.Parse(Console.ReadLine());
while (indat >= 0) {
sum += indat;
indat = Int32.Parse(Console.ReadLine());
}
value = Int32.Parse(Console.ReadLine());
do {
value /= 10;
digits ++;
} while (value > 0);
sum = 0;
indat = Int32.Parse(Console.ReadLine());
while (indat >= 0) {
sum += indat;
indat = Int32.Parse(Console.ReadLine());
}
value = Int32.Parse(Console.ReadLine());
do {
value /= 10;
digits ++;
} while (value > 0);
请注意,这些示例中的所有变量都是整数类型。对象ReadLine的方法Console从键盘获取一行文本。Int32.Parse在其字符串参数中找到数字,将其转换为int类型,然后返回它。
Note that all variables in these examples are integer type. The ReadLine method of the Console object gets a line of text from the keyboard. Int32.Parse finds the number in its string parameter, converts it to int type, and returns it.
在逻辑循环的预测试版本 ( while) 中,只要表达式计算结果为真,就会执行语句或语句段。在后测试版本 ( do) 中,循环体会一直执行,直到表达式计算结果为假。在这两种情况下,语句都可以是复合的。这两个语句的操作语义描述如下:
In the pretest version of a logical loop (while), the statement or statement segment is executed as long as the expression evaluates to true. In the posttest version (do), the loop body is executed until the expression evaluates to false. In both cases, the statement can be compound. The operational semantics descriptions of those two statements follow:
while
循环:
ifcontrol_expression 为假goto[
循环体]
goto循环
出:...
while
loop:
if control_expression is false goto out
[loop body]
goto loop
out: . . .
do-while
do-while
循环:
[循环体]
ifcontrol_expression 为真goto循环
loop:
[loop body]
if control_expression is true goto loop
在 C 和 C++ 中,分支到循环体都是合法的while。C89do版本使用算术表达式进行控制;在 C99 和 C++ 中,它可以是算术表达式或布尔表达式。
It is legal in both C and C++ to branch into both while and do loop bodies. The C89 version uses an arithmetic expression for control; in C99 and C++, it may be either arithmetic or Boolean.
Java 的while和do语句与 C 和 C++ 的类似,不同之处在于控制表达式必须是boolean类型,并且因为 Java 没有 goto,所以循环体不能在循环开头以外的任何地方输入。
Java’s while and do statements are similar to those of C and C++, except the control expression must be boolean type, and because Java does not have a goto, the loop bodies cannot be entered anywhere except at their beginnings.
后测试循环很少用,而且可能有些危险,因为程序员有时会忘记循环体至少会执行一次。将后测试控制物理地放置在循环体之后(它在那里具有语义效果)的语法设计有助于通过使逻辑清晰来避免此类问题。
Posttest loops are infrequently useful and also can be somewhat dangerous, in the sense that programmers sometimes forget that the loop body will always be executed at least once. The syntactic design of placing a posttest control physically after the loop body, where it has its semantic effect, helps avoid such problems by making the logic clear.
预测试逻辑循环可以用纯函数形式模拟,其递归函数类似于第 8.3.1.5 节中用于模拟计数循环语句的函数。在这两种情况下,循环体都写成函数。以下是用 F# 编写的模拟逻辑预测试循环的一般形式:
A pretest logical loop can be simulated in a purely functional form with a recursive function that is similar to the one used to simulate a counting loop statement in Section 8.3.1.5. In both cases, the loop body is written as a function. Following is the general form of a simulated logical pretest loop, written in F#:
let rec whileLoop test body =
if test() then
body()
whileLoop test body
else
();;let rec whileLoop test body =
if test() then
body()
whileLoop test body
else
();;在某些情况下,程序员可以方便地选择循环控制的位置,而不是循环体的顶部或底部。因此,某些语言提供了此功能。用户定位循环控制的语法机制可能相对简单,因此其设计并不困难。此类循环具有无限循环的结构,但包含一个或多个用户定位循环出口。也许最有趣的问题是是否可以退出单个循环或多个嵌套循环。这种机制的设计问题如下:
In some situations, it is convenient for a programmer to choose a location for loop control other than the top or bottom of the loop body. As a result, some languages provide this capability. A syntactic mechanism for user-located loop control can be relatively simple, so its design is not difficult. Such loops have the structure of infinite loops but include one or more user-located loop exits. Perhaps the most interesting question is whether a single loop or several nested loops can be exited. The design issues for such a mechanism are the following:
有条件机制是否应成为退出的一个组成部分?
Should the conditional mechanism be an integral part of the exit?
是否应该只退出一个循环体,或者是否可以退出封闭循环?
Should only one loop body be exited, or can enclosing loops also be exited?
C、C++、Python、Ruby 和 C# 具有无条件无标记退出(break)。Java 和 Perl 具有无条件标记退出(break在 Java 中,last在 Perl 中)。
C, C++, Python, Ruby, and C# have unconditional unlabeled exits (break). Java and Perl have unconditional labeled exits (break in Java, last in Perl).
下面是 Java 中嵌套循环的一个示例,其中嵌套循环中的外层循环被中断:
Following is an example of nested loops in Java, in which there is a break out of the outer loop from the nested loop:
outerLoop:
for (row = 0; row < numRows; row++)
for (col = 0; col < numCols; col++) {
sum += mat[row][col];
if (sum > 1000.0)
break outerLoop;
}outerLoop:
for (row = 0; row < numRows; row++)
for (col = 0; col < numCols; col++) {
sum += mat[row][col];
if (sum > 1000.0)
break outerLoop;
}
C、C++ 和 Python 包含一个未标记的控制语句,continue它将控制权转移到最小封闭循环的控制机制。这不是退出,而是一种在不终止循环构造的情况下跳过当前迭代中其余循环语句的方法。例如,考虑以下内容:
C, C++, and Python include an unlabeled control statement, continue, that transfers control to the control mechanism of the smallest enclosing loop. This is not an exit but rather a way to skip the rest of the loop statements on the current iteration without terminating the loop construct. For example, consider the following:
while (sum < 1000) {
getnext(value);
if (value < 0) continue;
sum += value;
}while (sum < 1000) {
getnext(value);
if (value < 0) continue;
sum += value;
}
负值会导致跳过赋值语句,而是将控制权转移到循环顶部的条件语句。另一方面,在
A negative value causes the assignment statement to be skipped, and control is transferred instead to the conditional at the top of the loop. On the other hand, in
while (sum < 1000) {
getnext(value);
if (value < 0) break;
sum += value;
}while (sum < 1000) {
getnext(value);
if (value < 0) break;
sum += value;
}
负值将终止循环。
a negative value terminates the loop.
last和都break提供了从循环中退出的多个语句,这似乎有点妨碍了可读性。但是,需要终止循环的异常情况非常常见,因此这样的语句是合理的。此外,可读性不会受到严重损害,因为所有此类循环退出的目标都是循环(或封闭循环)后的第一个语句,而不是程序中的任何地方。最后,使用多个 break 来退出多层循环的替代方案对可读性的影响更大。
Both last and break provide for multiple exits from loops, which may seem to be somewhat of a hindrance to readability. However, unusual conditions that require loop termination are so common that such a statement is justified. Furthermore, readability is not seriously harmed, because the target of all such loop exits is the first statement after the loop (or an enclosing loop) rather than just anywhere in the program. Finally, the alternative of using multiple breaks to leave more than one level of loops is even more detrimental to readability.
用户定位循环出口的动机很简单:它们使用高度受限的分支语句满足 goto 语句的常见需求。goto 的目标可以位于程序中的许多位置,既可以位于 goto 本身的上方,也可以位于其下方。但是,用户定位循环出口的目标必须位于出口下方,并且只能紧跟在复合语句的末尾。
The motivation for user-located loop exits is simple: They fulfill a common need for goto statements using a highly restricted branch statement. The target of a goto can be many places in the program, both above and below the goto itself. However, the targets of user-located loop exits must be below the exit and can only follow immediately at the end of a compound statement.
通用的基于数据的迭代语句使用用户定义的数据结构和用户定义的函数(迭代器)来遍历结构的元素。迭代器在每次迭代开始时被调用,每次调用时,迭代器都会以某种特定顺序从特定数据结构返回一个元素。例如,假设程序有一个用户定义的数据节点二叉树,并且每个节点中的数据必须以某种特定顺序处理。树的用户定义迭代语句将连续设置循环变量以指向树中的节点,每次迭代一个节点。用户定义迭代语句的初始执行需要向迭代器发出特殊调用以获取第一个树元素。迭代器必须始终记住它最后呈现的节点,以便它访问所有节点而不会多次访问任何节点。因此,迭代器必须对历史记录敏感。当迭代器无法找到更多元素时,用户定义的迭代语句将终止。
A general data-based iteration statement uses a user-defined data structure and a user-defined function (the iterator) to go through the structure’s elements. The iterator is called at the beginning of each iteration, and each time it is called, the iterator returns an element from a particular data structure in some specific order. For example, suppose a program has a user-defined binary tree of data nodes, and the data in each node must be processed in some particular order. A user-defined iteration statement for the tree would successively set the loop variable to point to the nodes in the tree, one for each iteration. The initial execution of the user-defined iteration statement needs to issue a special call to the iterator to get the first tree element. The iterator must always remember which node it presented last so that it visits all nodes without visiting any node more than once. So an iterator must be history sensitive. A user-defined iteration statement terminates when the iterator fails to find more elements.
for由于 C 语言的语句具有很大的灵活性,因此可以用来模拟用户定义的迭代语句。再次假设要处理二叉树的节点。如果树根由名为 的变量指向,root并且如果traverse是一个将其参数设置为按所需顺序指向树的下一个元素的函数,则可以使用以下内容:
The for statement of the C-based languages, because of its great flexibility, can be used to simulate a user-defined iteration statement. Once again, suppose the nodes of a binary tree are to be processed. If the tree root is pointed to by a variable named root, and if traverse is a function that sets its parameter to point to the next element of a tree in the desired order, the following could be used:
for (ptr = root; ptr == null; ptr = traverse(ptr)) {
. . .
}for (ptr = root; ptr == null; ptr = traverse(ptr)) {
. . .
}
在这个语句中,traverse是迭代器。
In this statement, traverse is the iterator.
用户定义的迭代语句在面向对象编程中比在早期软件开发范例中更重要,因为面向对象编程的用户通常使用抽象数据类型来表示数据结构,尤其是集合。在这种情况下,用户定义的迭代语句及其迭代器必须由数据抽象的作者提供,因为用户不知道该类型的对象的表示。
User-defined iteration statements are more important in object-oriented programming than they were in earlier software development paradigms, because users of object-oriented programming routinely use abstract data types for data structures, especially collections. In such cases, a user-defined iteration statement and its iterator must be provided by the author of the data abstraction because the representation of the objects of the type is not known to the user.
Java 5.0 中增加了该语句的增强版本for。该语句简化了对实现接口的数组或集合中的对象进行迭代的操作Iterable。(Java 中所有预定义的泛型集合都实现了Iterable。)例如,如果我们有一个名为字符串的ArrayList4 个集合myList,则以下语句将迭代其所有元素,并将每个元素设置为myElement:
An enhanced version of the for statement was added to Java in Java 5.0. This statement simplifies iterating through the values in an array or objects in a collection that implements the Iterable interface. (All of the predefined generic collections in Java implement Iterable.) For example, if we had an ArrayList4 collection named myList of strings, the following statement would iterate through all of its elements, setting each to myElement:
for (String myElement : myList) { . . . }for (String myElement : myList) { . . . }
这个新语句被称为“foreach”,尽管它的保留字是for。
This new statement is referred to as “foreach,” although its reserved word is for.
C# 和 F#(以及其他 .NET 语言)也有用于集合的通用库类。例如,有用于列表的通用集合类,它们是动态长度数组、堆栈、队列和字典(哈希表)。所有这些预定义的通用集合都有内置迭代器,这些迭代器与语句一起隐式使用foreach。此外,用户可以定义自己的集合并编写自己的迭代器,这些迭代器可以实现接口,从而可以在这些集合上IEnumerator使用。foreach
C# and F# (and the other .NET languages) also have generic library classes for collections. For example, there are generic collection classes for lists, which are dynamic length arrays, stacks, queues, and dictionaries (hash table). All of these predefined generic collections have built-in iterators that are used implicitly with the foreach statement. Furthermore, users can define their own collections and write their own iterators, which can implement the IEnumerator interface, which enables the use of foreach on these collections.
例如,考虑以下 C# 代码:
For example, consider the following C# code:
List<String> names = new List<String>();
names.Add("Bob");
names.Add("Carol");
names.Add("Alice");
. . .
foreach (String name in names)
Console.WriteLine(name);List<String> names = new List<String>();
names.Add("Bob");
names.Add("Carol");
names.Add("Alice");
. . .
foreach (String name in names)
Console.WriteLine(name);
在 Ruby 中,块是一系列代码,由括号或do和end保留字分隔。块可以与专门编写的方法一起使用,以创建许多有用的构造,包括数据结构的迭代器。此构造由方法调用和块组成。块实际上是一个匿名方法,它作为参数发送给方法(其调用在它之前)。然后,被调用的方法可以调用块,从而产生输出或对象。
In Ruby, a block is a sequence of code, delimited by either braces or the do and end reserved words. Blocks can be used with specially written methods to create many useful constructs, including iterators for data structures. This construct consists of a method call followed by a block. A block is actually an anonymous method that is sent to the method (whose call precedes it) as a parameter. The called method can then call the block, which can produce output or objects.
Ruby 预定义了几种迭代器方法,例如times用于upto计数器控制循环以及each数组和哈希的简单迭代的 和 。例如,考虑以下使用 的示例times:
Ruby predefines several iterator methods, such as times and upto for counter-controlled loops, and each for simple iterations of arrays and hashes. For example, consider the following example of using times:
>> 4.times {puts "Hey!"}
Hey!
Hey!
Hey!
Hey!
=> 4>> 4.times {puts "Hey!"}
Hey!
Hey!
Hey!
Hey!
=> 4
请注意,>>是交互式 Ruby 解释器的提示符,=>用于指示表达式的返回值。Rubyputs语句显示其参数。在此示例中,times方法被发送到对象4,块作为参数一起发送。该times方法调用块四次,产生四行输出。目标对象4是的返回值times。
Note that >> is the prompt of the interactive Ruby interpreter and => is used to indicate the return value of the expression. The Ruby puts statement displays its parameter. In this example, the times method is sent to the object 4, with the block sent along as a parameter. The times method calls the block four times, producing the four lines of output. The destination object, 4, is the return value from times.
最常见的 Ruby 迭代器是each,它通常用于遍历数组并将块应用于每个元素。5为此,允许块具有参数会很方便,如果存在参数,则参数会出现在块的开头,以竖线分隔
下面的示例使用块参数来说明如何使用each:
The most common Ruby iterator is each, which is often used to go through arrays and apply a block to each element.5 For this purpose, it is convenient to allow blocks to have parameters, which, if present, appear at the beginning of the block, delimited by vertical bars
The following example, which uses a block parameter, illustrates the use of each:
>> list = [2, 4, 6, 8]
=> [2, 4, 6, 8]
>> list.each {|value| puts value}
2
4
6
8
=> [2, 4, 6, 8]>> list = [2, 4, 6, 8]
=> [2, 4, 6, 8]
>> list.each {|value| puts value}
2
4
6
8
=> [2, 4, 6, 8]
在此示例中,将针对方法所发送到的数组的每个元素调用该块each。该块生成输出,即数组元素的列表。的返回值each是方法所发送到的数组。
In this example, the block is called for each element of the array to which the each method is sent. The block produces the output, which is a list of the array’s elements. The return value of each is the array to which it is sent.
Ruby 不使用计数循环,而是使用方法upto。例如,我们可以有以下内容:
Instead of a counting loop, Ruby has the upto method. For example, we could have the following:
1.upto(5) {|x| print x, " "}1.upto(5) {|x| print x, " "}
这将产生以下输出:
This produces the following output:
1 2 3 4 51 2 3 4 5
也可以使用类似于for其他语言中的循环的语法,如下所示:
Syntax that resembles a for loop in other languages could also be used, as in the following:
for x in 1..5
print x, " "
end for x in 1..5
print x, " "
end
Ruby 实际上没有for语句——类似上述的构造被 Ruby 转换为upto方法调用。
Ruby actually has no for statement—constructs like the above are converted by Ruby into upto method calls.
现在我们来看一下块是如何工作的。yield语句类似于方法调用,只不过没有接收者对象,并且调用是请求执行附加到方法调用的块,而不是对方法的调用。yield仅在使用块调用的方法中调用。如果块有参数,则在语句的括号中指定它们。yield块返回的值是块中最后一个表达式的值。这个过程用于实现内置迭代器,例如times。
Now we consider how blocks work. The yield statement is similar to a method call, except that there is no receiver object and the call is a request to execute the block attached to the method call, rather than a call to a method. yield is only called in a method that has been called with a block. If the block has parameters, they are specified in parentheses in the yield statement. The value returned by a block is that of the last expression evaluated in the block. It is this process that is used to implement the built-in iterators, such as times.
Python 为迭代提供了强大的支持。假设需要处理某个用户定义数据结构中的节点。进一步假设该结构具有一种遍历方法,该方法按所需顺序遍历结构的节点。以下骨架类定义包括这种遍历方法,该方法一次生成一个此类实例的节点。
Python provides strong support for iteration. Suppose one needs to process the nodes in some user-defined data structure. Further suppose that the structure has a traversal method that goes through the nodes of the structure in the desired order. The following skeletal class definition includes such a traversal method that produces the nodes of an instance of this class, one at a time.
class MyStructure:
# Other method definitions, including a constructor
def traverse(self):
# if there is another node:
# set nod to next node
# else:
# return
yield nodclass MyStructure:
# Other method definitions, including a constructor
def traverse(self):
# if there is another node:
# set nod to next node
# else:
# return
yield nod
方法traverse似乎是一个常规的 Python 方法,但它包含一个yield语句,这极大地改变了方法的语义。实际上,该方法在单独的控制线程中运行。yield语句的作用类似于返回。第一次调用时traverse,yield返回结构的初始节点。然而,在第二次调用时,它返回第二个节点。除了第一次调用之外traverse,它从上次执行停止的地方开始执行。它不是从头开始,而是恢复执行。这种方法中的任何本地存储都会在其调用之间维护。在 的情况下traverse,后续调用从其代码的开头开始执行,但处于上次执行的状态。在 Python 中,任何包含语句的方法yield称为生成器,因为它一次生成一个元素的数据。
The traverse method appears to be a regular Python method, but it contains a yield statement, which dramatically changes the semantics of the method. In effect, the method is run in a separate thread of control. The yield statement acts like a return. On the first call to traverse, yield returns the initial node of the structure. However, on the second call, it returns the second node. On all but the first call to traverse, it begins its execution where it left off on the previous execution. Instead of restarting at its beginning, it is resumed. Any local storage in such a method is maintained across its calls. In the case of traverse, subsequent calls begin their execution at the beginning of its code, but in the state that it was in its previous execution. In Python, any method that contains a yield statement is called a generator, because it generates data one element at a time.
当然,也可以生成结构的所有节点,将它们存储在数组中,然后从数组中处理它们。但是,节点数量可能很大,需要一个大型数组来存储它们。使用迭代器的方法更优雅,并且不受数据结构大小的影响。
Of course, one could also produce all of the nodes of the structure, store them in an array, and process them from the array. However, the number of nodes could be large, requiring a large array to store them. The approach using the iterator is more elegant and is not affected by the size of the data structure.
无条件分支语句将执行控制转移到程序中的指定位置。20 世纪 60 年代后期,语言设计中最激烈的争论是关于无条件分支是否应该成为任何高级语言的一部分,如果是,是否应该限制其使用的问题。无条件分支,或称 goto,是控制程序语句执行流程的最强大的语句。但是,不当使用 goto 可能会导致严重的问题。goto 具有惊人的功能和极大的灵活性(所有其他控制结构都可以用 goto 和选择器构建),但正是这种功能使其使用变得危险。如果没有语言设计或编程标准所施加的使用限制,goto 语句会使程序很难阅读,因此非常不可靠并且维护成本高昂。
An unconditional branch statement transfers execution control to a specified location in the program. The most heated debate in language design in the late 1960s was over the issue of whether unconditional branching should be part of any high-level language, and if so, whether its use should be restricted. The unconditional branch, or goto, is the most powerful statement for controlling the flow of execution of a program’s statements. However, careless use of the goto can lead to serious problems. The goto has stunning power and great flexibility (all other control structures can be built with goto and a selector), but it is this power that makes its use dangerous. Without usage restrictions, imposed by either language design or programming standards, goto statements can make programs very difficult to read, and as a result, highly unreliable and costly to maintain.
尽管之前已有数位深思熟虑的人指出了 goto 的潜在问题,但 Edsger Dijkstra 才是计算机界第一个广泛传播的揭露 goto 危害的文章。他在信中指出:“goto 语句本身太过原始;它很容易把程序弄得一团糟”(Dijkstra,1968a)。在 Dijkstra 发表关于 goto 的观点后的最初几年里,许多人公开主张彻底禁止或至少限制 goto 的使用。不赞成彻底废除 goto 的人中有Donald Knuth (1974),他认为有时 goto 的效率大于其对可读性的损害。
Although several thoughtful people had pointed out the potential problems of gotos earlier, it was Edsger Dijkstra who gave the computing world the first widely read exposé on the dangers of the goto. In his letter he noted, “The goto statement as it stands is just too primitive; it is too much an invitation to make a mess of one’s program” (Dijkstra, 1968a). During the first few years after publication of Dijkstra’s views on the goto, a large number of people argued publicly for either outright banishment or at least restrictions on the use of the goto. Among those who did not favor complete elimination was Donald Knuth (1974), who argued that there were occasions when the efficiency of the goto outweighed its harm to readability.
这些问题直接源于 goto 能够强制任何程序语句按照执行顺序跟随任何其他语句,而不管该语句在文本顺序上是位于先前执行的语句之前还是之后。当程序中语句的执行顺序与它们出现的顺序几乎相同时,可读性最好 - 在我们的例子中,这意味着从上到下,这是我们习惯的顺序。因此,限制 goto 使其只能在程序中向下转移控制权可以部分缓解该问题。它允许 goto 在代码段之间转移控制权以响应错误或异常情况,但不允许使用它们来构建任何类型的循环。
These problems follow directly from a goto’s ability to force any program statement to follow any other in execution sequence, regardless of whether that statement precedes or follows the previously executed statement in textual order. Readability is best when the execution order of statements in a program is nearly the same as the order in which they appear—in our case, this would mean top to bottom, which is the order to which we are accustomed. Thus, restricting gotos so they can transfer control only downward in a program partially alleviates the problem. It allows gotos to transfer control around code sections in response to errors or unusual conditions but disallows their use to build any sort of loop.
一些语言在设计时没有使用 goto,例如 Java、Python 和 Ruby。但是,目前大多数流行的语言都包含 goto 语句。Kernighan和 Ritchie (1978)称 goto 是无限滥用的,但它仍然包含在 Ritchie 的语言 C 中。已经消除 goto 的语言提供了额外的控制语句,通常以循环退出的形式,以编写 goto 的合理应用之一。
A few languages have been designed without a goto—for example, Java, Python, and Ruby. However, most currently popular languages include a goto statement. Kernighan and Ritchie (1978) call the goto infinitely abusable, but it is nevertheless included in Ritchie’s language, C. The languages that have eliminated the goto have provided additional control statements, usually in the form of loop exits, to code one of the justifiable applications of the goto.
相对较新的语言 C# 包含 goto,尽管它所基于的语言之一 Java 没有 goto。C# 的 goto 的一个合法用法是在语句中,如第8.2.2.2节switch所述。
The relatively new language, C#, includes a goto, even though one of the languages on which it is based, Java, does not. One legitimate use of C#’s goto is in the switch statement, as discussed in Section 8.2.2.2.
8.3.3节 中讨论的所有循环退出语句实际上都是伪装的 goto 语句。然而,它们是严格受限制的 goto,并且不会损害可读性。事实上,可以说它们提高了可读性,因为避免使用它们会导致代码复杂且不自然,从而更难理解。
All of the loop exit statements discussed in Section 8.3.3 are actually camouflaged goto statements. They are, however, severely restricted gotos and are not harmful to readability. In fact, it can be argued that they improve readability, because to avoid their use results in convoluted and unnatural code that would be much more difficult to understand.
Dijkstra (1975)提出了完全不同的选择和循环结构形式。他的主要动机是提供支持程序设计方法的控制语句,以确保在开发期间而不是在验证或测试完成的程序时确保正确性。Dijkstra (1976)描述了这种方法。另一个动机是使用受保护的命令可以提高推理的清晰度。简而言之,受保护命令语句中的选择语句的可选部分可以独立于语句的任何其他部分进行考虑,而这对于常见编程语言的选择语句来说并非如此。
Quite different forms of selection and loop structures were suggested by Dijkstra (1975). His primary motivation was to provide control statements that would support a program design methodology that ensured correctness during development rather than when verifying or testing completed programs. This methodology is described in Dijkstra (1976). Another motivation is the increased clarity in reasoning that is possible with guarded commands. Simply put, a selectable segment of a selection statement in a guarded-command statement can be considered independently of any other part of the statement, which is not true for the selection statements of the common programming languages.
本章将介绍受保护命令,因为它们是后来为 CSP 中的并发编程开发的语言机制的基础(Hoare,1978)。受保护命令还用于定义 Haskell 中的函数,如第15章 所述。
Guarded commands are covered in this chapter because they are the basis for the linguistic mechanism developed later for concurrent programming in CSP (Hoare, 1978). Guarded commands are also used to define functions in Haskell, as discussed in Chapter 15.
Dijkstra 的选择语句具有以下形式
Dijkstra’s selection statement has the form
if <布尔表达式> -> <语句>[] <布尔表达式> -> <语句>[] . . . [] <布尔表达式> -> <语句>fi
if <Boolean expression> -> <statement>[] <Boolean expression> -> <statement>[] . . . [] <Boolean expression> -> <statement>fi
结束保留字fi是开头保留字的反向拼写。这种结束保留字形式取自 ALGOL 68。小块称为fatbars,用于分隔受保护的子句,并允许子句成为语句序列。选择语句中的每一行都由布尔表达式(保护)和语句或语句序列组成,称为受保护的命令。
The closing reserved word, fi, is the opening reserved word spelled backward. This form of closing reserved word is taken from ALGOL 68. The small blocks, called fatbars, are used to separate the guarded clauses and allow the clauses to be statement sequences. Each line in the selection statement, consisting of a Boolean expression (a guard) and a statement or statement sequence, is called a guarded command.
此选择语句看起来像多项选择,但其语义不同。每次在执行过程中到达语句时,都会对所有布尔表达式进行求值。如果多个表达式为真,则可以非确定性地选择其中一个相应的语句进行执行。实现可能始终选择与第一个求值为真的布尔表达式相关联的语句。但它可以选择与真布尔表达式相关联的任何语句。因此,程序的正确性不能取决于选择哪个语句(在与真布尔表达式相关联的语句中)。如果没有一个布尔表达式为真,则会发生运行时错误,导致程序终止。这迫使程序员考虑并列出所有可能性。考虑以下示例:
This selection statement has the appearance of a multiple selection, but its semantics is different. All of the Boolean expressions are evaluated each time the statement is reached during execution. If more than one expression is true, one of the corresponding statements can be nondeterministically chosen for execution. An implementation might always choose the statement associated with the first Boolean expression that evaluates to be true. But it may choose any statement associated with a true Boolean expression. So, the correctness of the program cannot depend on which statement is chosen (among those associated with true Boolean expressions). If none of the Boolean expressions are true, a run-time error occurs that causes program termination. This forces the programmer to consider and list all possibilities. Consider the following example:
if i = 0 -> sum := sum + i
[] i > j -> sum := sum + j
[] j > i -> sum := sum + k
fi if i = 0 -> sum := sum + i
[] i > j -> sum := sum + j
[] j > i -> sum := sum + k
fi
如果i = 0和j > i,则此语句在第一个和第三个赋值语句之间进行不确定的选择。如果i等于j且不为零,则会发生运行时错误,因为所有条件都不成立。
If i = 0 and j > i, this statement chooses nondeterministically between the first and third assignment statements. If i is equal to j and is not zero, a run-time error occurs because none of the conditions are true.
这条语句可以优雅地允许程序员声明在某些情况下执行顺序是无关紧要的。例如,要找到两个数字中的最大值,我们可以使用
This statement can be an elegant way of allowing the programmer to state that the order of execution, in some cases, is irrelevant. For example, to find the largest of two numbers, we can use
if x >= y -> max := x
[] y >= x -> max := y
fi if x >= y -> max := x
[] y >= x -> max := y
fi
这可以计算出所需的结果,而无需过度指定解决方案。特别是,如果x和y相等,则我们将哪个分配给都无关紧要max。这是语句的非确定性语义提供的一种抽象形式。
This computes the desired result without overspecifying the solution. In particular, if x and y are equal, it does not matter which we assign to max. This is a form of abstraction provided by the nondeterministic semantics of the statement.
现在,考虑用传统编程语言选择器编码的相同过程:
Now, consider this same process coded in a traditional programming language selector:
if (x >= y)
max = x;
else
max = y;if (x >= y)
max = x;
else
max = y;
也可以编码如下:
This could also be coded as follows:
if (x > y)
max = x;
else
max = y;if (x > y)
max = x;
else
max = y;
这两个语句之间没有实际区别。第一个语句在和相等时分配x给;第二个语句在相同情况下分配给。这两个语句之间的选择使代码的形式分析和正确性证明变得复杂。这也是 Dijkstra 开发受保护命令的原因之一。maxxyymax
There is no practical difference between these two statements. The first assigns x to max when x and y are equal; the second assigns y to max in the same circumstance. This choice between the two statements complicates the formal analysis of the code and the correctness proof of it. This is one of the reasons why guarded commands were developed by Dijkstra.
Dijkstra 提出的循环结构具有以下形式
The loop structure proposed by Dijkstra has the form
do <布尔表达式> -> <语句>[] <布尔表达式> -> <语句>[] . . . [] <布尔表达式> -> <语句>od
do <Boolean expression> -> <statement>[] <Boolean expression> -> <statement>[] . . . [] <Boolean expression> -> <statement>od
此语句的语义是,每次迭代时都会对所有布尔表达式进行求值。如果多个表达式为真,则会非确定性地(可能是随机地)选择其中一个相关语句进行执行,之后再次对表达式进行求值。当所有表达式同时为假时,循环终止。
The semantics of this statement is that all Boolean expressions are evaluated on each iteration. If more than one is true, one of the associated statements is nondeterministically (perhaps randomly) chosen for execution, after which the expressions are again evaluated. When all expressions are simultaneously false, the loop terminates.
考虑以下问题:给定四个整数变量,,,,和q1,重新排列这四个变量的值,使得q2q3q4q1
q2
q3
q4。如果没有受保护的命令,一种简单的解决方案是将四个值放入一个数组中,对数组进行排序,然后将数组中的值分配回标量变量q1、q2、q3和q4。虽然这个解决方案并不困难,但它需要大量的代码,特别是如果必须包含排序过程的话。
Consider the following problem: Given four integer variables, q1, q2, q3, and q4, rearrange the values of the four so that q1
q2
q3
q4. Without guarded commands, one straightforward solution is to put the four values into an array, sort the array, and then assign the values from the array back into the scalar variables q1, q2, q3, and q4. While this solution is not difficult, it requires a good deal of code, especially if the sort process must be included.
现在,考虑下面的代码,它使用受保护的命令来解决相同的问题,但方式更简洁、更优雅。6
Now, consider the following code, which uses guarded commands to solve the same problem but in a more concise and elegant way.6
do q1 > q2 -> temp := q1; q1 := q2; q2 := temp;
[] q2 > q3 -> temp := q2; q2 := q3; q3 := temp;
[] q3 > q4 -> temp := q3; q3 := q4; q4 := temp;
od do q1 > q2 -> temp := q1; q1 := q2; q2 := temp;
[] q2 > q3 -> temp := q2; q2 := q3; q3 := temp;
[] q3 > q4 -> temp := q3; q3 := q4; q4 := temp;
od
Dijkstra 的受保护命令控制语句很有趣,部分原因是它们说明了语句的语法和语义如何影响程序验证,反之亦然。使用 goto 语句时,程序验证几乎是不可能的。如果 (1) 仅使用逻辑循环和选择或 (2) 仅使用受保护的命令,则验证将大大简化。受保护命令的公理语义可以方便地指定(Gries,1981)。然而,显而易见的是,与传统的确定性命令相比,受保护命令的实现复杂性大大增加。
Dijkstra’s guarded command control statements are interesting, in part because they illustrate how the syntax and semantics of statements can have an impact on program verification and vice versa. Program verification is virtually impossible when goto statements are used. Verification is greatly simplified if (1) only logical loops and selections are used or (2) only guarded commands are used. The axiomatic semantics of guarded commands are conveniently specified (Gries, 1981). It should be obvious, however, that there is considerably increased complexity in the implementation of the guarded commands over their conventional deterministic counterparts.
我们已经描述并讨论了各种语句级控制结构。现在似乎应该进行简要评估。
We have described and discussed a variety of statement-level control structures. A brief evaluation now seems to be in order.
首先,我们有理论结果,即只有序列、选择和预测试逻辑循环是表达计算所必需的(Böhm 和 Jacopini,1966 年)。希望完全禁止无条件分支的人利用了这一结果。当然,goto 已经存在足够多的实际问题,无需理论理由就可以谴责它。goto 的主要合法需求之一(过早退出循环)可以通过受限分支语句来满足,例如break。
First, we have the theoretical result that only sequence, selection, and pretest logical loops are absolutely required to express computations (Böhm and Jacopini, 1966). This result has been used by those who wish to ban unconditional branching altogether. Of course, there are already sufficient practical problems with the goto to condemn it without also using a theoretical reason. One of the main legitimate needs for gotos—premature exits from loops—can be met with restricted branch statements, such as break.
Böhm 和 Jacopini 结果的一个明显误用是反对在选择和预测试逻辑循环之外包含任何控制结构。目前还没有一种广泛使用的语言采取这一步骤;此外,我们怀疑任何一种语言都不会采取这一步骤,因为这会对可写性和可读性产生负面影响。仅使用选择和预测试逻辑循环编写的程序通常结构不太自然,更复杂,因此更难编写和阅读。例如,C# 多重选择结构极大地提高了 C# 的可写性,没有明显的负面影响。另一个例子是许多语言的计数循环结构,尤其是当语句很简单时。
One obvious misuse of the Böhm and Jacopini result is to argue against the inclusion of any control structures beyond selection and pretest logical loops. No widely used language has yet taken that step; furthermore, we doubt that any ever will, because of the negative effect on writability and readability. Programs written with only selection and pretest logical loops are generally less natural in structure, more complex, and therefore harder to write and more difficult to read. For example, the C# multiple selection structure is a great boost to C# writability, with no obvious negatives. Another example is the counting loop structure of many languages, especially when the statement is simple.
目前还不清楚,许多其他已提出的控制结构的实用性是否值得将它们纳入语言中(Ledgard 和 Marcotty,1975 年)。这个问题在很大程度上取决于是否必须最小化语言的大小这一基本问题。Wirth (1975 年)和Hoare(1973 年)都强烈支持语言设计的简单性。对于控制结构而言,简单性意味着语言中应该只有少数控制语句,并且它们应该简单。
It is not so clear that the utility of many of the other control structures that have been proposed is worth their inclusion in languages (Ledgard and Marcotty, 1975). This question rests to a large degree on the fundamental question of whether the size of languages must be minimized. Both Wirth (1975) and Hoare (1973) strongly endorse simplicity in language design. In the case of control structures, simplicity means that only a few control statements should be in a language, and they should be simple.
已发明的语句级控制结构种类繁多,这反映出语言设计者的观点的多样性。经过所有的发明、讨论和评估,对于语言中应该包含的精确控制语句集,人们仍然没有达成一致意见。当然,大多数当代语言都有类似的控制语句,但它们的语法和语义细节仍然存在一些差异。此外,对于语言是否应该包含 goto 仍然存在分歧;C++ 和 C# 包含,但 Java 和 Ruby 不包含。
The rich variety of statement-level control structures that have been invented shows the diversity of opinion among language designers. After all the invention, discussion, and evaluation, there is still no unanimity of opinion on the precise set of control statements that should be in a language. Most contemporary languages do, of course, have similar control statements, but there is still some variation in the details of their syntax and semantics. Furthermore, there is still disagreement on whether a language should include a goto; C++ and C# do, but Java and Ruby do not.
控制语句有几种类别:选择、多重选择、迭代和无条件分支。
Control statements occur in several categories: selection, multiple selection, iterative, and unconditional branching.
switchC 语言的语句是多选语句的代表。C# 版本通过禁止从选定段隐式延续到下一个可选段,消除了其前身的可靠性问题。
The switch statement of the C-based languages is representative of multiple-selection statements. The C# version eliminates the reliability problem of its predecessors by disallowing the implicit continuation from a selected segment to the following selectable segment.
高级语言中已经发明了大量不同的循环语句。C 的for语句是最灵活的迭代语句,尽管它的灵活性会导致一些可靠性问题。
A large number of different loop statements have been invented for high-level languages. C’s for statement is the most flexible iteration statement, although its flexibility leads to some reliability problems.
大多数语言的循环都有退出语句;这些语句取代了 goto 语句的最常见用法之一。
Most languages have exit statements for their loops; these statements take the place of one of the most common uses of goto statements.
基于数据的迭代器是处理数据结构(如链表、哈希表和树)的循环语句。for基于 C 的语言的语句允许用户为用户定义的数据创建迭代器。Perlforeach和 C# 的语句是标准数据结构的预定义迭代器。在当代面向对象语言中,集合的迭代器具有标准接口,由集合的设计者实现。
Data-based iterators are loop statements for processing data structures, such as linked lists, hashes, and trees. The for statement of the C-based languages allows the user to create iterators for user-defined data. The foreach statement of Perl and C# is a predefined iterator for standard data structures. In the contemporary object-oriented languages, iterators for collections are specified with standard interfaces, which are implemented by the designers of the collections.
Ruby 包含迭代器,它是发送给各种对象的一种特殊方法。该语言预定义了常用的迭代器,但也允许用户定义迭代器。
Ruby includes iterators that are a special form of methods that are sent to various objects. The language predefines iterators for common uses, but also allows user-defined iterators.
无条件分支,或称 goto,是大多数命令式语言的一部分。它的问题引起了广泛的讨论和争论。目前的共识是,它应该保留在大多数语言中,但应该通过编程规范将其危险性降至最低。
The unconditional branch, or goto, has been part of most imperative languages. Its problems have been widely discussed and debated. The current consensus is that it should remain in most languages but that its dangers should be minimized through programming discipline.
Dijkstra 的保护命令是一种具有积极理论特征的替代控制语句。尽管它们没有被采纳为语言的控制语句,但部分语义出现在 CSP 的并发机制和 Haskell 的函数定义中。
Dijkstra’s guarded commands are alternative control statements with positive theoretical characteristics. Although they have not been adopted as the control statements of a language, part of the semantics appear in the concurrency mechanisms of CSP and the function definitions of Haskell.
控制结构的定义是什么?
What is the definition of control structure?
Böhm 和 Jocopini 对流程图证明了什么?
What did Böhm and Jocopini prove about flowcharts?
块的定义是什么?
What is the definition of block?
所有选择和迭代控制语句的设计问题是什么?
What is/are the design issue(s) for all selection and iteration control statements?
选择结构的设计问题是什么?
What are the design issues for selection structures?
Python 的复合语句设计有何不寻常之处?
What is unusual about Python’s design of compound statements?
在什么情况下 F# 选择器必须有 else 子句?
Under what circumstances must an F# selector have an else clause?
解决双向选择器嵌套问题的常见方法有哪些?
What are the common solutions to the nesting problem for two-way selectors?
多选语句的设计问题是什么?
What are the design issues for multiple-selection statements?
在决定一次执行多选语句时是否执行多个可选段时,在哪两种语言特性之间进行权衡?
Between what two language characteristics is a trade-off made when deciding whether more than one selectable segment is executed in one execution of a multiple selection statement?
C 的多选语句有何不寻常之处?
What is unusual about C’s multiple-selection statement?
C 的switch陈述基于哪种先前的语言?
On what previous language was C’s switch statement based?
解释为什么 C# 的 switch 语句比 C 的更安全。
Explain how C#’s switch statement is safer than that of C.
所有迭代控制语句的设计问题是什么?
What are the design issues for all iterative control statements?
计数器控制循环语句的设计问题是什么?
What are the design issues for counter-controlled loop statements?
什么是预测试循环语句? 什么是后测试循环语句?
What is a pretest loop statement? What is a posttest loop statement?
forC++的语句和Java的语句有什么区别?
What is the difference between the for statement of C++ and that of Java?
C 的语句在哪些方面for比许多其他语言的语句更灵活?
In what way is C’s for statement more flexible than that of many other languages?
rangePython 中的函数起什么作用?
What does the range function in Python do?
哪些现代语言不包含 goto?
What contemporary languages do not include a goto?
逻辑控制循环语句的设计问题是什么?
What are the design issues for logically controlled loop statements?
发明用户定位循环控制语句的主要原因是什么?
What is the main reason user-located loop control statements were invented?
用户定位循环控制机制的设计问题是什么?
What are the design issues for user-located loop control mechanisms?
breakJava的语句相对于C的语句有什么优势break?
What advantage does Java’s break statement have over C’s break statement?
breakC++的语句和Java的语句有什么区别?
What are the differences between the break statement of C++ and that of Java?
什么是用户定义的迭代控件?
What is a user-defined iteration control?
哪个 Scheme 函数实现多选语句?
What Scheme function implements a multiple selection statement?
函数式语言如何实现重复?
How does a functional language implement repetition?
Ruby 中如何实现迭代器?
How are iterators implemented in Ruby?
哪种语言预定义了可以明确调用来迭代其预定义数据结构的迭代器?
What language predefines iterators that can be explicitly called to iterate over its predefined data structures?
哪种常见的编程语言从 Dijkstra 的保护命令中借用了部分设计?
What common programming language borrows part of its design from Dijkstra’s guarded commands?
描述需要组合计数和逻辑循环语句的三种情况。
Describe three situations where a combined counting and logical looping statement is needed.
研究Liskov 等人(1981)的 CLU 迭代器特性,并确定其优缺点。
Study the iterator feature of CLU in Liskov et al. (1981) and determine its advantages and disadvantages.
将 Ada 控制语句集与 C# 的控制语句集进行比较,并确定哪个更好以及原因。
Compare the set of Ada control statements with those of C# and decide which are better and why.
在复合语句中使用唯一的结束保留字有哪些优缺点?
What are the pros and cons of using unique closing reserved words on compound statements?
Python 使用缩进来指定控制语句中的复合语句的论点、优点和缺点是什么?
What are the arguments, pros and cons, for Python’s use of indentation to specify compound statements in control statements?
分析使用与相应初始保留字相反的控制语句的闭包保留字(如case-esacALGOL 68 的保留字)可能带来的可读性问题。例如,考虑常见的打字错误,如字符调换。
Analyze the potential readability problems with using closure reserved words for control statements that are the reverse of the corresponding initial reserved words, such as the case-esac reserved words of ALGOL 68. For example, consider common typing errors such as transposing characters.
使用科学引文索引查找引用Knuth (1974)的文章。阅读该文章和 Knuth 的论文,并撰写一篇论文,总结 goto 问题的双方观点。
Use the Science Citation Index to find an article that refers to Knuth (1974). Read the article and Knuth’s paper and write a paper that summarizes both sides of the goto issue.
在关于 goto 问题的论文中,Knuth (1974)提出了一种允许多次退出的循环控制语句。阅读该论文并写出该语句的操作语义描述。
In his paper on the goto issue, Knuth (1974) suggests a loop control statement that allows multiple exits. Read the paper and write an operational semantics description of the statement.
支持和反对在 Java 的控制语句中仅使用布尔表达式(而不是像 C++ 那样也允许使用算术表达式)的论点是什么?
What are the arguments both for and against the exclusive use of Boolean expressions in the control statements in Java (as opposed to also allowing arithmetic expressions, as in C++)?
描述一种编程情况,其中 Python 语句中的 else 子句for会很方便。
Describe a programming situation in which the else clause in Python’s for statement would be convenient.
描述需要后测试循环的三种特定编程情况。
Describe three specific programming situations that require a posttest loop.
推测将控制权转移到 C 循环语句中的原因。
Speculate as to the reason control can be transferred into a C loop statement.
使用指定语言的循环结构重写以下伪代码段:
Rewrite the following pseudocode segment using a loop structure in the specified languages:
k = (j + 13) / 27
loop:
if k > 10 then goto out
k = k + 1
i = 3 * k - 1
goto loop
out: . . . k = (j + 13) / 27
loop:
if k > 10 then goto out
k = k + 1
i = 3 * k - 1
goto loop
out: . . .
C、C++、Java 或 C#
C, C++, Java, or C#
Python
Python
红宝石
Ruby
假设所有变量都是整数类型。讨论哪种语言对于此代码具有最佳的可写性、最佳的可读性以及两者的最佳结合。
Assume all variables are integer type. Discuss which language, for this code, has the best writability, the best readability, and the best combination of the two.
重做编程练习 1,但这次将所有变量和常量改为浮点类型,并将语句
Redo Programming Exercise 1, except this time make all the variables and constants floating-point type, and change the statement
k = k + 1k = k + 1
到
to
k = k + 1.2k = k + 1.2
使用以下语言的多选语句重写以下代码段:
Rewrite the following code segment using a multiple-selection statement in the following languages:
if ((k == 1) || (k == 2)) j = 2 * k - 1
if ((k == 3) || (k == 5)) j = 3 * k + 1
if (k == 4) j = 4 * k - 1
if ((k == 6) || (k == 7) || (k == 8)) j = k - 2
if ((k == 1) || (k == 2)) j = 2 * k - 1
if ((k == 3) || (k == 5)) j = 3 * k + 1
if (k == 4) j = 4 * k - 1
if ((k == 6) || (k == 7) || (k == 8)) j = k - 2
C、C++、Java 或 C#
C, C++, Java, or C#
Python
Python
红宝石
Ruby
假设所有变量都是整数类型。讨论使用这些语言编写特定代码的相对优点。
Assume all variables are integer type. Discuss the relative merits of the use of these languages for this particular code.
考虑以下 C 程序段。不使用 goto 或breaks 重写它。
Consider the following C program segment. Rewrite it using no gotos or breaks.
j = -3;
for (i = 0; i < 3; i++) {
switch (j + 2) {
case 3:
case 2: j--; break;
case 0: j += 2; break;
default: j = 0;
}
if (j > 0) break;
j = 3 - i
}j = -3;
for (i = 0; i < 3; i++) {
switch (j + 2) {
case 3:
case 2: j--; break;
case 0: j += 2; break;
default: j = 0;
}
if (j > 0) break;
j = 3 - i
}
在给CACM编辑的一封信中,Rubin (1987)使用以下代码段作为证据,证明某些带有 goto 的代码的可读性优于不带 goto 的等效代码。此代码查找名为 的n x n整数矩阵的第一行x,该行只有零值。
In a letter to the editor of CACM, Rubin (1987) uses the following code segment as evidence that the readability of some code with gotos is better than the equivalent code without gotos. This code finds the first row of an n by n integer matrix named x that has nothing but zero values.
for (i = 1; i <= n; i++) {
for (j = 1; j <= n; j++)
if (x[i][j] != 0)
goto reject;
println ('First all-zero row is:', i);
break;
reject:
}for (i = 1; i <= n; i++) {
for (j = 1; j <= n; j++)
if (x[i][j] != 0)
goto reject;
println ('First all-zero row is:', i);
break;
reject:
}
使用以下语言之一重写此代码(不带 goto):C、C++、Java 或 C#。将代码的可读性与示例代码的可读性进行比较。
Rewrite this code without gotos in one of the following languages: C, C++, Java, or C#. Compare the readability of your code to that of the example code.
考虑以下编程问题:三个整数变量的值first—— second、和third—必须以明显的含义放置在三个变量max、mid和min中,而无需使用数组或用户定义或预定义的子程序。写出该问题的两个解决方案,一个使用嵌套选择,另一个不使用嵌套选择。比较两者的复杂性和预期可靠性。
Consider the following programming problem: The values of three integer variables—first, second, and third—must be placed in the three variables max, mid, and min, with the obvious meanings, without using arrays or user-defined or predefined subprograms. Write two solutions to this problem, one that uses nested selections and one that does not. Compare the complexity and expected reliability of the two.
if使用C 语言中的 goto 语句重写编程练习 4 的 C 程序段。
Rewrite the C program segment of Programming Exercise 4 using if and goto statements in C.
用 Java 重写编程练习 4 的 C 程序段,不使用switch语句。
Rewrite the C program segment of Programming Exercise 4 in Java without using a switch statement.
将以下对 Scheme 的调用转换COND为 C 并将结果值设置为y。
Translate the following call to Scheme’s COND to C and set the resulting value to y.
(COND
((> x 10) x)
((< x 5) (* 2 x))
((= x 7) (+ x 10))
) (COND
((> x 10) x)
((< x 5) (* 2 x))
((= x 7) (+ x 10))
)子程序是程序的基本构建块,因此是编程语言设计中最重要的概念之一。我们现在探讨子程序的设计,包括参数传递方法、本地引用环境、重载子程序、通用子程序以及与子程序相关的别名和有问题的副作用。我们还讨论了间接调用的子程序、闭包和协程。
Subprograms are the fundamental building blocks of programs and are therefore among the most important concepts in programming language design. We now explore the design of subprograms, including parameter-passing methods, local referencing environments, overloaded subprograms, generic subprograms, and the aliasing and problematic side effects that are associated with subprograms. We also include discussions of indirectly called subprograms, closures, and coroutines.
Implementation methods for subprograms are discussed in Chapter 10.
编程语言中可以包含两种基本的抽象功能:进程抽象和数据抽象。在高级编程语言的早期历史中,只包含进程抽象。以子程序形式出现的进程抽象一直是所有编程语言的核心概念。然而,在 20 世纪 80 年代,许多人开始相信数据抽象同样重要。第11章 将详细讨论数据抽象。
Two fundamental abstraction facilities can be included in a programming language: process abstraction and data abstraction. In the early history of high-level programming languages, only process abstraction was included. Process abstraction, in the form of subprograms, has been a central concept in all programming languages. In the 1980s, however, many people began to believe that data abstraction was equally important. Data abstraction is discussed in detail in Chapter 11.
第一台可编程计算机是巴贝奇的分析机,它建于 19 世纪 40 年代,能够在程序的多个不同位置重复使用指令卡集合。在现代编程语言中,这样的语句集合被写成子程序。这种重用可以节省内存空间和编码时间。这种重用也是一种抽象,因为子程序的计算细节在程序中被调用子程序的语句所取代。该描述(子程序中的语句集合)不是描述如何在程序中完成某些计算,而是通过调用语句来执行,从而有效地抽象出细节。这通过强调程序的逻辑结构同时隐藏其低级细节来提高程序的可读性。
The first programmable computer, Babbage’s Analytical Engine, built in the 1840s, had the capability of reusing collections of instruction cards at several different places in a program. In a modern programming language, such a collection of statements is written as a subprogram. This reuse results in savings in memory space and coding time. Such reuse is also an abstraction, for the details of the subprogram’s computation are replaced in a program by a statement that calls the subprogram. Instead of describing how some computation is to be done in a program, that description (the collection of statements in the subprogram) is enacted by a call statement, effectively abstracting away the details. This increases the readability of a program by emphasizing its logical structure while hiding its low-level details.
面向对象语言的方法与本章讨论的子程序密切相关。方法与子程序的主要区别在于调用方式以及与类和对象的关联。虽然方法的这些特殊特性将在第12章 中讨论,但本章将讨论它们与子程序共有的特性,例如参数和局部变量。
The methods of object-oriented languages are closely related to the subprograms discussed in this chapter. The primary way methods differ from subprograms is the way they are called and their associations with classes and objects. Although these special characteristics of methods are discussed in Chapter 12, the features they share with subprograms, such as parameters and local variables, are discussed in this chapter.
本章讨论的所有子程序(第9.13节 中描述的协同程序除外)都具有以下特点:
All subprograms discussed in this chapter, except the coroutines described in Section 9.13, have the following characteristics:
每个子程序都有一个入口点。
Each subprogram has a single entry point.
调用程序单元在被调用子程序执行期间被暂停,这意味着在任何给定时刻只有一个子程序在执行。
The calling program unit is suspended during the execution of the called subprogram, which implies that there is only one subprogram in execution at any given time.
当子程序执行终止时,控制权总是返回给调用者。
Control always returns to the caller when the subprogram execution terminates.
Alternatives to these result in coroutines and concurrent units (Chapter 13).
大多数子程序都有名称,但有些是匿名的。第9.12节 有 C# 中匿名子程序的示例。
Most subprograms have names, although some are anonymous. Section 9.12 has examples of anonymous subprograms in C#.
子程序定义描述了子程序抽象的接口和操作。子程序调用是执行特定子程序的明确请求。如果子程序在被调用后开始执行但尚未完成,则称其处于活动状态。第 9.2.4节定义并讨论了两种基本类型的子程序,即过程和函数。
A subprogram definition describes the interface to and the actions of the subprogram abstraction. A subprogram call is the explicit request that a specific subprogram be executed. A subprogram is said to be active if, after having been called, it has begun execution but has not yet completed that execution. The two fundamental kinds of subprograms, procedures and functions, are defined and discussed in Section 9.2.4.
子程序头是定义的第一部分,它有多种用途。首先,它指定以下语法单元是某种特定类型的子程序定义。1在具有多种子程序的语言中,子程序的类型通常用特殊词指定。其次,如果子程序不是匿名的,则头会为子程序提供名称。第三,它可以指定参数列表。
A subprogram header, which is the first part of the definition, serves several purposes. First, it specifies that the following syntactic unit is a subprogram definition of some particular kind.1 In languages that have more than one kind of subprogram, the kind of the subprogram is usually specified with a special word. Second, if the subprogram is not anonymous, the header provides a name for the subprogram. Third, it may specify a list of parameters.
请考虑以下标题示例:
Consider the following header examples:
def adder参数):
def adder parameters):
这是名为 的 Python 子程序的标头adder。Ruby 子程序标头也以 开头def。JavaScript 子程序的标头以 开头function。
This is the header of a Python subprogram named adder. Ruby subprogram headers also begin with def. The header of a JavaScript subprogram begins with function.
在 C 语言中,函数头的命名adder可能如下:
In C, the header of a function named adder might be as follows:
void adder (参数)
void adder (parameters)
该头中的保留字void表示子程序不返回值。
The reserved word void in this header indicates that the subprogram does not return a value.
子程序的主体定义其操作。在基于 C 的语言(以及一些其他语言,例如 JavaScript)中,子程序的主体由括号分隔。在 Ruby 中,语句end终止子程序的主体。与复合语句一样,Python 函数主体中的语句必须缩进,并且主体的结尾由第一个未缩进的语句表示。
The body of subprograms defines its actions. In the C-based languages (and some others—for example, JavaScript) the body of a subprogram is delimited by braces. In Ruby, an end statement terminates the body of a subprogram. As with compound statements, the statements in the body of a Python function must be indented and the end of the body is indicated by the first statement that is not indented.
Python 函数与其他常见编程语言的函数不同的一个特点是函数def语句是可执行的。def执行语句时,它会将给定的名称分配给给定的函数体。在执行函数之前def,无法调用该函数。请考虑以下骨架示例:
One characteristic of Python functions that sets them apart from the functions of other common programming languages is that function def statements are executable. When a def statement is executed, it assigns the given name to the given function body. Until a function’s def has been executed, the function cannot be called. Consider the following skeletal example:
if . . .
def fun( . . . ):
. . .
else
def fun( . . . ):
. . . if . . .
def fun( . . . ):
. . .
else
def fun( . . . ):
. . .
如果执行此选择结构的 then 子句,则可以调用该版本的函数fun,但不能调用 else 子句中的版本。同样,如果选择了 else 子句,则可以调用其版本的函数,但不能调用 then 子句中的版本。
If the then clause of this selection construct is executed, that version of the function fun can be called, but not the version in the else clause. Likewise, if the else clause is chosen, its version of the function can be called but the one in the then clause cannot.
Ruby 方法与其他编程语言的子程序有几个有趣的不同之处。Ruby 方法通常在类定义中定义,但也可以定义在类定义之外,在这种情况下,它们被视为根对象的方法Object。这些方法可以在没有对象接收器的情况下调用,就像它们是 C 或 C++ 中的函数一样。如果 Ruby 方法在没有接收器的情况下被调用,self则假定为。如果类中没有同名的方法,则搜索封闭类,如果需要,最多搜索到Object。
Ruby methods differ from the subprograms of other programming languages in several interesting ways. Ruby methods are often defined in class definitions but can also be defined outside class definitions, in which case they are considered methods of the root object, Object. Such methods can be called without an object receiver, as if they were functions in C or C++. If a Ruby method is called without a receiver, self is assumed. If there is no method by that name in the class, enclosing classes are searched, up to Object, if necessary.
子程序的参数配置文件包含其形式参数的数量、顺序和类型。子程序的协议是其参数配置文件加上(如果是函数)其返回类型。在子程序具有类型的语言中,这些类型由子程序的协议定义。
The parameter profile of a subprogram contains the number, order, and types of its formal parameters. The protocol of a subprogram is its parameter profile plus, if it is a function, its return type. In languages in which subprograms have types, those types are defined by the subprogram’s protocol.
子程序可以有声明也可以有定义。这种形式与 C 语言中的变量声明和定义相似,在 C 语言中,声明用于提供类型信息,但不用于定义变量。子程序声明提供子程序的协议,但不包括其主体。在那些不允许前向引用子程序的语言中,声明是必需的。无论是变量还是子程序,都需要声明进行静态类型检查。对于子程序,必须检查参数的类型。函数声明在 C 和 C++ 程序中很常见,在这些程序中,它们被称为原型。此类声明通常放在头文件中。
Subprograms can have declarations as well as definitions. This form parallels the variable declarations and definitions in C, in which declarations are used to provide type information but not to define variables. Subprogram declarations provide the subprogram’s protocol but do not include their bodies. They are necessary in languages that do not allow forward references to subprograms. In both the cases of variables and subprograms, declarations are needed for static type checking. In the case of subprograms, it is the type of the parameters that must be checked. Function declarations are common in C and C++ programs, where they are called prototypes. Such declarations are often placed in header files.
在大多数其他语言(C 和 C++ 除外)中,子程序不需要声明,因为没有要求在调用子程序之前定义它们。
In most other languages (other than C and C++), subprograms do not need declarations, because there is no requirement that subprograms be defined before they are called.
子程序通常描述计算。非方法子程序可以通过两种方式访问要处理的数据:通过直接访问非局部变量(在其他地方声明,但在计算可以通过两种方式实现:通过子程序或参数传递来访问数据。通过参数传递的数据使用子程序本地的名称来访问。参数传递比直接访问非局部变量更灵活。本质上,具有对要处理的数据的参数访问的子程序是一个参数化计算。它可以对通过其参数接收的任何数据执行计算(假设参数的类型与子程序预期的一致)。如果通过非局部变量访问数据,则计算可以在不同数据上进行的唯一方法是在调用子程序之间为这些非局部变量赋予新值。对非局部变量的广泛访问会降低可靠性。在需要访问的子程序中可见的变量通常最终在不需要访问它们的地方也是可见的。这个问题在第5章 中讨论了。
Subprograms typically describe computations. There are two ways that a nonmethod subprogram can gain access to the data that it is to process: through direct access to nonlocal variables (declared elsewhere but visible in the subprogram) or through parameter passing. Data passed through parameters are accessed using names that are local to the subprogram. Parameter passing is more flexible than direct access to nonlocal variables. In essence, a subprogram with parameter access to the data that it is to process is a parameterized computation. It can perform its computation on whatever data it receives through its parameters (presuming the types of the parameters are as expected by the subprogram). If data access is through nonlocal variables, the only way the computation can proceed on different data is to assign new values to those nonlocal variables between calls to the subprogram. Extensive access to nonlocals can reduce reliability. Variables that are visible to the subprogram where access is desired often end up also being visible where access to them is not needed. This problem was discussed in Chapter 5.
尽管方法也通过非本地引用和参数访问外部数据,但方法要处理的主要数据是调用该方法的对象。但是,当方法访问非本地数据时,可靠性问题与非方法子程序相同。此外,在面向对象语言中,方法对类变量(与类而不是对象关联的变量)的访问与非本地数据的概念有关,应尽可能避免。在这种情况下,以及 C 函数访问非本地数据的情况,该方法可能会产生改变其参数或本地数据以外的内容的副作用。此类更改使方法的语义复杂化并降低其可靠性。
Although methods also access external data through nonlocal references and parameters, the primary data to be processed by a method is the object through which the method is called. However, when a method does access nonlocal data, the reliability problems are the same as with nonmethod subprograms. Also, in an object-oriented language, method access to class variables (those associated with the class, rather than an object) is related to the concept of nonlocal data and should be avoided whenever possible. In this case, as well as the case of a C function accessing nonlocal data, the method can have the side effect of changing something other than its parameters or local data. Such changes complicate the semantics of the method and make it less reliable.
纯函数式编程语言(例如 Haskell)没有可变数据,因此用它们编写的函数无法以任何方式更改内存 - 它们只是执行计算并返回结果值(或函数,因为函数是纯函数式语言中的值)。
Pure functional programming languages, such as Haskell, do not have mutable data, so functions written in them are unable to change memory in any way—they simply perform calculations and return a resulting value (or function, since functions are values in a pure functional language).
在某些情况下,将计算(而不是数据)作为参数传递给子程序会很方便。在这些情况下,实现该计算的子程序的名称可以用作参数。这种形式的参数在9.6节中讨论。数据参数在 9.5节 中讨论。
In some situations, it is convenient to be able to transmit computations, rather than data, as parameters to subprograms. In these cases, the name of the subprogram that implements that computation may be used as a parameter. This form of parameter is discussed in Section 9.6. Data parameters are discussed in Section 9.5.
子程序头中的参数称为形式参数。它们有时被认为是伪变量,因为它们不是通常意义上的变量:在大多数情况下,它们仅在调用子程序时才绑定到存储,并且该绑定通常是通过一些其他程序变量进行的。
The parameters in the subprogram header are called formal parameters. They are sometimes thought of as dummy variables because they are not variables in the usual sense: In most cases, they are bound to storage only when the subprogram is called, and that binding is often through some other program variables.
子程序调用语句必须包含子程序名和要与子程序形式参数绑定的参数列表。这些参数称为实际参数。2必须将它们与形式参数区分开来,因为两者通常在形式上有不同的限制,当然,它们的用途也大不相同。
Subprogram call statements must include the name of the subprogram and a list of parameters to be bound to the formal parameters of the subprogram. These parameters are called actual parameters.2 They must be distinguished from formal parameters, because the two usually have different restrictions on their forms, and of course, their uses are quite different.
在大多数编程语言中,实际参数和形式参数之间的对应关系(或实际参数与形式参数的绑定)是通过位置来完成的:第一个实际参数绑定到第一个形式参数,依此类推。这样的参数称为位置参数。只要参数列表相对较短,这是一种将实际参数与其对应的形式参数相关联的有效且安全的方法。
In most programming languages, the correspondence between actual and formal parameters—or the binding of actual parameters to formal parameters—is done by position: The first actual parameter is bound to the first formal parameter and so forth. Such parameters are called positional parameters. This is an effective and safe method of relating actual parameters to their corresponding formal parameters, as long as the parameter lists are relatively short.
然而,当参数列表很长时,程序员很容易在列表中的实际参数顺序上犯错误。解决此问题的一个方法是提供关键字参数,其中实际参数要绑定到的形式参数的名称与调用中的实际参数一起指定。关键字参数的优点是它们可以在实际参数列表中以任何顺序出现。可以使用此技术调用 Python 函数,如下所示
When parameter lists are long, however, it is easy for a programmer to make mistakes in the order of actual parameters in the list. One solution to this problem is to provide keyword parameters, in which the name of the formal parameter to which an actual parameter is to be bound is specified with the actual parameter in a call. The advantage of keyword parameters is that they can appear in any order in the actual parameter list. Python functions can be called using this technique, as in
sumer(length = my_length,
list = my_array,
sum = my_sum)sumer(length = my_length,
list = my_array,
sum = my_sum)
其中 的定义sumer有形式参数length、list和sum。
where the definition of sumer has the formal parameters length, list, and sum.
关键字参数的缺点是子程序的用户必须知道形式参数的名称。
The disadvantage to keyword parameters is that the user of the subprogram must know the names of formal parameters.
除了关键字参数外,某些语言(例如 Python)还允许使用位置参数。关键字和位置参数可以在调用中混合使用,例如
In addition to keyword parameters, some languages, for example Python, allow positional parameters. Keyword and positional parameters can be mixed in a call, as in
sumer(my_length,
sum = my_sum,
list = my_array)sumer(my_length,
sum = my_sum,
list = my_array)
这种方法的唯一限制是,当关键字参数出现在列表中后,所有剩余的参数都必须是关键字。这个限制是必要的,因为关键字参数出现后,位置可能不再明确。
The only restriction with this approach is that after a keyword parameter appears in the list, all remaining parameters must be keyworded. This restriction is necessary because a position may no longer be well defined after a keyword parameter has appeared.
在 Python、Ruby、C++ 和 PHP 中,形式参数可以具有默认值。如果子程序头中没有向形式参数传递实际参数,则使用默认值。考虑以下 Python 函数头:
In Python, Ruby, C++, and PHP, formal parameters can have default values. A default value is used if no actual parameter is passed to the formal parameter in the subprogram header. Consider the following Python function header:
def compute_pay(income, exemptions = 1, tax_rate)def compute_pay(income, exemptions = 1, tax_rate)
exemptions在调用中可以没有形式参数;compute_pay如果缺少,则值1。Python 调用中不存在的实际参数不包括逗号,因为这种逗号的唯一作用是指示下一个参数的位置,而在本例中不是这是必要的,因为在缺失的实际参数之后的所有实际参数都必须加上关键字。例如,考虑以下调用:
The exemptions formal parameter can be absent in a call to compute_pay; when it is, the value 1 is used. No comma is included for an absent actual parameter in a Python call, because the only value of such a comma would be to indicate the position of the next parameter, which in this case is not necessary because all actual parameters after an absent actual parameter must be keyworded. For example, consider the following call:
pay = compute_pay(20000.0, tax_rate = 0.15)pay = compute_pay(20000.0, tax_rate = 0.15)
在不支持关键字参数的 C++ 中,默认参数的规则必然不同。默认参数必须出现在最后,因为参数是位置关联的。一旦在调用中省略了默认参数,所有剩余的形式参数都必须具有默认值。该compute_pay函数的 C++ 函数头可以写成如下形式:
In C++, which does not support keyword parameters, the rules for default parameters are necessarily different. The default parameters must appear last, because parameters are positionally associated. Once a default parameter is omitted in a call, all remaining formal parameters must have default values. A C++ function header for the compute_pay function can be written as follows:
float compute_pay(float income, float tax_rate,
int exemptions = 1)float compute_pay(float income, float tax_rate,
int exemptions = 1)
请注意,参数已重新排列,具有默认值的参数位于最后。以下compute_pay是调用 C++ 函数的示例
Notice that the parameters are rearranged so that the one with the default value is last. An example call to the C++ compute_pay function is
pay = compute_pay(20000.0, 0.15);pay = compute_pay(20000.0, 0.15);
在大多数没有形式参数默认值的语言中,调用中的实际参数数量必须与子程序定义头中的形式参数数量相匹配。然而,在 C、C++、Perl 和 JavaScript 中,这并不是必需的。当调用中的实际参数比函数定义中的形式参数少时,程序员有责任确保参数对应关系(始终是位置上的)和子程序执行是合理的。
In most languages that do not have default values for formal parameters, the number of actual parameters in a call must match the number of formal parameters in the subprogram definition header. However, in C, C++, Perl, and JavaScript, this is not required. When there are fewer actual parameters in a call than formal parameters in a function definition, it is the programmer’s responsibility to ensure that the parameter correspondence, which is always positional, and the subprogram execution are sensible.
虽然这种允许可变数量参数的设计显然容易出错,但有时也很方便。例如,printfC 函数可以打印任意数量的项目(数据值和/或文字)。
Although this design, which allows a variable number of parameters, is clearly prone to error, it is also sometimes convenient. For example, the printf function of C can print any number of items (data values and/or literals).
C# 允许方法接受可变数量的参数,只要它们是同一类型。该方法使用修饰符指定其形式参数params。调用可以发送数组或表达式列表,其值由编译器放置在数组中并提供给被调用的方法。例如,考虑以下方法:
C# allows methods to accept a variable number of parameters, as long as they are of the same type. The method specifies its formal parameter with the params modifier. The call can send either an array or a list of expressions, whose values are placed in an array by the compiler and provided to the called method. For example, consider the following method:
public void DisplayList(params int[] list) {
foreach (int next in list) {
Console.WriteLine("Next value {0}", next);
}
}public void DisplayList(params int[] list) {
foreach (int next in list) {
Console.WriteLine("Next value {0}", next);
}
}
如果DisplayList为类定义MyClass,并且我们有以下声明,
If DisplayList is defined for the class MyClass and we have the following declarations,
Myclass myObject = new Myclass;
int[] myList = new int[6] {2, 4, 6, 8, 10, 12};
Myclass myObject = new Myclass;
int[] myList = new int[6] {2, 4, 6, 8, 10, 12};
DisplayList可以使用以下任一方式调用:
DisplayList could be called with either of the following:
myObject.DisplayList(myList);
myObject.DisplayList(2, 4, 3 * x - 1, 17);
myObject.DisplayList(myList);
myObject.DisplayList(2, 4, 3 * x - 1, 17);
Ruby 支持复杂但高度灵活的实际参数配置。初始参数是表达式,其值对象传递给相应的形式参数。初始参数后面可以跟一个键值对列表=>,这些键值对放在一个匿名哈希中,并将对该哈希的引用传递给下一个形式参数。它们用作关键字参数的替代品,Ruby 不支持关键字参数。哈希项后面可以跟一个以星号开头的单个参数。此参数称为数组形式参数。调用该方法时,数组形式参数被设置为引用一个新Array对象。所有剩余的实际参数都分配给新对象的元素Array。如果与数组形式参数相对应的实际参数是数组,则它也必须在前面加上星号,并且它必须是最后一个实际参数。3因此,Ruby 允许可变数量的参数,方式与 C# 类似。由于 Ruby 数组可以存储不同的类型,因此不要求传递给数组的实际参数具有相同的类型。
Ruby supports a complicated but highly flexible actual parameter configuration. The initial parameters are expressions, whose value objects are passed to the corresponding formal parameters. The initial parameters can be followed by a list of key => value pairs, which are placed in an anonymous hash and a reference to that hash is passed to the next formal parameter. These are used as a substitute for keyword parameters, which Ruby does not support. The hash item can be followed by a single parameter preceded by an asterisk. This parameter is called the array formal parameter. When the method is called, the array formal parameter is set to reference a new Array object. All remaining actual parameters are assigned to the elements of the new Array object. If the actual parameter that corresponds to the array formal parameter is an array, it must also be preceded by an asterisk, and it must be the last actual parameter.3 So, Ruby allows a variable number of parameters in a way similar to that of C#. Because Ruby arrays can store different types, there is no requirement that the actual parameters passed to the array have the same type.
以下示例骨架函数定义和调用说明了 Ruby 的参数结构:
The following example skeletal function definition and call illustrate the parameter structure of Ruby:
list = [2, 4, 6, 8]
def tester(p1, p2, p3, *p4)
. . .
end . . .
tester('first', mon => 72, tue => 68, wed => 59, *list)
list = [2, 4, 6, 8]
def tester(p1, p2, p3, *p4)
. . .
end . . .
tester('first', mon => 72, tue => 68, wed => 59, *list)
在 其中tester,其形式参数的值如下:
Inside tester, the values of its formal parameters are as follows:
p1 is 'first'
p2 is {mon => 72, tue => 68, wed => 59}
p3 is 2
p4 is [4, 6, 8]
p1 is 'first'
p2 is {mon => 72, tue => 68, wed => 59}
p3 is 2
p4 is [4, 6, 8]
Python 支持与 Ruby 类似的参数。
Python supports parameters that are similar to those of Ruby.
子程序有两种不同的类别——过程和函数——它们都可以被视为扩展语言的方法。子程序是定义参数化计算。函数返回值,而过程不返回值。在大多数不将过程作为单独形式的子程序的语言中,函数可以定义为不返回值,并且可以用作过程。过程的计算由单个调用语句执行。实际上,过程定义了新的语句。例如,如果某种语言没有排序语句,用户可以构建一个过程来对数据数组进行排序,并使用对该过程的调用来代替不可用的排序语句。只有一些较旧的语言(如 Fortran 和 Ada)支持过程。
There are two distinct categories of subprograms—procedures and functions—both of which can be viewed as approaches to extending the language. Subprograms are collections of statements that define parameterized computations. Functions return values and procedures do not. In most languages that do not include procedures as a separate form of subprogram, functions can be defined not to return values and they can be used as procedures. The computations of a procedure are enacted by single call statements. In effect, procedures define new statements. For example, if a particular language does not have a sort statement, a user can build a procedure to sort arrays of data and use a call to that procedure in place of the unavailable sort statement. Only some older languages, such as Fortran and Ada, support procedures.
过程可以通过两种方法在调用程序单元中产生结果:(1)如果存在不是形式参数但在过程和调用程序单元中仍然可见的变量,则过程可以更改它们; (2)如果过程具有允许将数据传输给调用者的形式参数,则可以更改这些参数。
Procedures can produce results in the calling program unit by two methods: (1) If there are variables that are not formal parameters but are still visible in both the procedure and the calling program unit, the procedure can change them; and (2) if the procedure has formal parameters that allow the transfer of data to the caller, those parameters can be changed.
函数在结构上类似于过程,但在语义上以数学函数为模型。如果函数是忠实的模型,则它不会产生副作用;也就是说,它既不会修改其参数,也不会修改函数外部定义的任何变量。这样的函数会返回一个值 — 这是它唯一的预期效果。大多数编程语言中的函数都有副作用。
Functions structurally resemble procedures but are semantically modeled on mathematical functions. If a function is a faithful model, it produces no side effects; that is, it modifies neither its parameters nor any variables defined outside the function. Such a function returns a value—that is its only desired effect. The functions in most programming languages have side effects.
函数通过其名称在表达式中的出现以及所需的实际参数来调用。函数执行产生的值返回给调用代码,有效地替换调用本身。例如,表达式的值f(x)是f使用参数调用时产生的任何值x。对于不产生副作用的函数,返回的值是其唯一效果。
Functions are called by appearances of their names in expressions, along with the required actual parameters. The value produced by a function’s execution is returned to the calling code, effectively replacing the call itself. For example, the value of the expression f(x) is whatever value f produces when called with the parameter x. For a function that does not produce side effects, the returned value is its only effect.
函数定义新的用户定义运算符。例如,如果一种语言没有指数运算符,则可以编写一个函数,该函数返回其一个参数的值乘以另一个参数的幂。其 C++ 中的标头可以是
Functions define new user-defined operators. For example, if a language does not have an exponentiation operator, a function can be written that returns the value of one of its parameters raised to the power of another parameter. Its header in C++ could be
float power(float base, float exp)float power(float base, float exp)
可以这样调用
which could be called with
result = 3.4 * power(10.0, x)result = 3.4 * power(10.0, x)
标准 C++ 库包含一个名为的类似函数pow。将其与 Perl 中的相同操作进行比较,其中幂运算是一项内置运算:
The standard C++ library includes a similar function named pow. Compare this with the same operation in Perl, in which exponentiation is a built-in operation:
result = 3.4 * 10.0 ** xresult = 3.4 * 10.0 ** x
在某些编程语言中,允许用户通过为运算符定义新函数来重载运算符。用户定义的重载运算符在第 9.11节 中讨论。
In some programming languages, users are permitted to overload operators by defining new functions for operators. User-defined overloaded operators are discussed in Section 9.11.
子程序是复杂的结构,因此其设计涉及一系列问题。一个明显的问题是选择使用一种或多种参数传递方法。各种语言中使用的方法种类繁多,反映了人们对这个问题的看法的多样性。一个密切相关的问题是实际参数的类型是否将根据相应的形式参数的类型进行类型检查。
Subprograms are complex structures, and it follows from this that a lengthy list of issues is involved in their design. One obvious issue is the choice of one or more parameter-passing methods that will be used. The wide variety of approaches that have been used in various languages is a reflection of the diversity of opinion on the subject. A closely related issue is whether the types of actual parameters will be type checked against the types of the corresponding formal parameters.
子程序局部环境的性质在某种程度上决定了子程序的性质。这里最重要的问题是局部变量是静态分配的还是动态分配的。
The nature of the local environment of a subprogram dictates to some degree the nature of the subprogram. The most important question here is whether local variables are statically or dynamically allocated.
接下来是子程序定义是否可以嵌套的问题。另一个问题是子程序名称是否可以作为参数传递。如果子程序名称可以作为参数传递,并且语言允许子程序嵌套,那么作为参数传递的子程序的引用环境是否正确就成了一个问题。
Next, there is the question of whether subprogram definitions can be nested. Another issue is whether subprogram names can be passed as parameters. If subprogram names can be passed as parameters and the language allows subprograms to be nested, there is the question of the correct referencing environment of a subprogram that has been passed as a parameter.
如第5章 所述,函数的副作用可能会导致问题。因此,对副作用的限制是函数的一个设计问题。函数可以返回的值的类型和数量是另一个设计问题。
As seen in Chapter 5, side effects of functions can cause problems. So, restrictions on side effects are a design issue for functions. The types and number of values that can be returned from functions are other design issues.
最后,还有子程序是否可以重载或通用的问题。重载子程序是指在同一引用环境中与另一个子程序同名的子程序。通用子程序是指可以在不同调用中对不同类型的数据进行计算的子程序。闭包是一个嵌套的子程序及其引用环境,它们一起允许从程序中的任何位置调用子程序。
Finally, there are the questions of whether subprograms can be overloaded or generic. An overloaded subprogram is one that has the same name as another subprogram in the same referencing environment. A generic subprogram is one whose computation can be done on data of different types in different calls. A closure is a nested subprogram and its referencing environment, which together allow the subprogram to be called from anywhere in a program.
以下是对子程序总体设计问题的总结。与函数特别相关的其他问题将在第9.10节 中讨论。
The following is a summary of these design issues for subprograms in general. Additional issues that are specifically associated with functions are discussed in Section 9.10.
局部变量是静态分配的还是动态分配的?
Are local variables statically or dynamically allocated?
子程序定义可以出现在其他子程序定义中吗?
Can subprogram definitions appear in other subprogram definitions?
使用了什么参数传递方法?
What parameter-passing method or methods are used?
是否根据形式参数的类型检查实际参数的类型?
Are the types of the actual parameters checked against the types of the formal parameters?
如果子程序可以作为参数传递,并且子程序可以嵌套,那么传递的子程序的引用环境是什么?
If subprograms can be passed as parameters and subprograms can be nested, what is the referencing environment of a passed subprogram?
是否允许功能副作用?
Are functional side effects allowed?
函数可以返回哪些类型的值?
What types of values can be returned from functions?
函数可以返回多少个值?
How many values can be returned from functions?
子程序可以重载吗?
Can subprograms be overloaded?
子程序可以通用吗?
Can subprograms be generic?
如果语言允许嵌套子程序,是否支持闭包?
If the language allows nested subprograms, are closures supported?
以下章节将讨论这些问题和示例设计。
These issues and example designs are discussed in the following sections.
本节讨论与子程序中定义的变量相关的问题。还简要介绍了嵌套子程序定义的问题。
This section discusses the issues related to variables that are defined within subprograms. The issue of nested subprogram definitions is also briefly covered.
子程序可以定义自己的变量,从而定义局部引用环境。在子程序内部定义的变量称为局部变量,因为它们的作用域通常是定义它们的子程序的主体。
Subprograms can define their own variables, thereby defining local referencing environments. Variables that are defined inside subprograms are called local variables, because their scope is usually the body of the subprogram in which they are defined.
在第5章 的术语中,局部变量可以是静态的,也可以是堆栈动态的。如果局部变量是堆栈动态的,则当子程序开始执行时,它们将绑定到存储,当执行终止时,它们将与存储解除绑定。堆栈动态局部变量有几个优点,主要优点是灵活性。递归子程序必须具有堆栈动态局部变量。堆栈动态局部变量的另一个优点是,活动子程序中的局部变量的存储可以与所有非活动子程序中的局部变量共享。这个优势不像计算机内存较小时那么重要。
In the terminology of Chapter 5, local variables can be either static or stack dynamic. If local variables are stack dynamic, they are bound to storage when the subprogram begins execution and are unbound from storage when that execution terminates. There are several advantages of stack-dynamic local variables, the primary one being flexibility. It is essential that recursive subprograms have stack-dynamic local variables. Another advantage of stack-dynamic locals is that the storage for local variables in an active subprogram can be shared with the local variables in all inactive subprograms. This is not as important an advantage as it was when computers had smaller memories.
堆栈动态局部变量的主要缺点如下:首先,每次调用子程序时,都需要花费时间来分配、初始化(必要时)和释放这些变量。其次,对堆栈动态局部变量的访问必须是间接的,而对静态变量的访问可以是直接的。4需要这种间接性,因为只能在执行期间确定特定局部变量在堆栈中的位置(参见第10章 )。最后,当所有局部变量都是堆栈动态时,子程序不能对历史敏感;也就是说,它们不能在调用之间保留局部变量的数据值。能够编写历史敏感的子程序有时会很方便。需要历史敏感的子程序的一个常见例子是其任务是生成伪随机数的子程序。每次调用这样的子程序都会使用它计算的最后一个伪随机数来计算一个伪随机数。因此,它必须将最后一个伪随机数存储在静态局部变量中。协程和迭代器循环构造中使用的子程序(第8章 讨论)是需要历史敏感的子程序的其他示例。
The main disadvantages of stack-dynamic local variables are the following: First, there is the cost of the time required to allocate, initialize (when necessary), and deallocate such variables for each call to the subprogram. Second, accesses to stack-dynamic local variables must be indirect, whereas accesses to static variables can be direct.4 This indirectness is required because the place in the stack where a particular local variable will reside can be determined only during execution (see Chapter 10). Finally, when all local variables are stack dynamic, subprograms cannot be history sensitive; that is, they cannot retain data values of local variables between calls. It is sometimes convenient to be able to write history-sensitive subprograms. A common example of a need for a history-sensitive subprogram is one whose task is to generate pseudorandom numbers. Each call to such a subprogram computes one pseudorandom number, using the last one it computed. It must, therefore, store the last one in a static local variable. Coroutines and the subprograms used in iterator loop constructs (discussed in Chapter 8) are other examples of subprograms that need to be history sensitive.
静态局部变量相对于堆栈动态局部变量的主要优势在于它们效率略高 — 它们不需要运行时分配和释放开销。此外,如果直接访问,这些访问显然效率更高。当然,它们允许子程序具有历史敏感性。静态局部变量的最大缺点是它们无法支持递归。此外,它们的存储不能与其他不活动子程序的局部变量共享。
The primary advantage of static local variables over stack-dynamic local variables is that they are slightly more efficient—they require no run-time overhead for allocation and deallocation. Also, if accessed directly, these accesses are obviously more efficient. And, of course, they allow subprograms to be history sensitive. The greatest disadvantage of static local variables is their inability to support recursion. Also, their storage cannot be shared with the local variables of other inactive subprograms.
在大多数现代语言中,子程序中的局部变量默认为堆栈动态变量。在 C 和 C++ 函数中,除非特别声明为 ,否则局部变量都是堆栈动态变量static。例如,在下面的 C(或 C++)函数中,变量sum是静态的,也是count堆栈动态的。
In most contemporary languages, local variables in a subprogram are by default stack dynamic. In C and C++ functions, locals are stack dynamic unless specifically declared to be static. For example, in the following C (or C++) function, the variable sum is static and count is stack dynamic.
int adder(int list[], int listlen) {
static int sum = 0;
int count;
for (count = 0; count < listlen; count ++)
sum += list [count];
return sum;
}int adder(int list[], int listlen) {
static int sum = 0;
int count;
for (count = 0; count < listlen; count ++)
sum += list [count];
return sum;
}
C++、Java 和 C# 的方法只有堆栈动态局部变量。
The methods of C++, Java, and C# have only stack-dynamic local variables.
在 Python 中,方法定义中使用的声明仅用于全局变量。在方法中声明为全局的任何变量必须是在方法外部定义的变量。在方法外部定义的变量可以在方法中引用,而无需将其声明为全局变量,但不能在方法中赋值此类变量。如果在方法中赋值全局变量的名称,则该变量被隐式声明为局部变量,并且赋值不会干扰全局变量。Python 方法中的所有局部变量都是堆栈动态的。
In Python, the only declarations used in method definitions are for globals. Any variable declared to be global in a method must be a variable defined outside the method. A variable defined outside the method can be referenced in the method without declaring it to be global, but such a variable cannot be assigned in the method. If the name of a global variable is assigned in a method, it is implicitly declared to be a local and the assignment does not disturb the global. All local variables in Python methods are stack dynamic.
嵌套子程序的想法源自 ALGOL 60。其动机是能够创建逻辑和范围的层次结构。如果一个子程序只需要在另一个子程序中,为什么不把它放在那里并对程序的其余部分隐藏呢?因为静态作用域通常用于允许嵌套子程序的语言,所以这也提供了一种高度结构化的方式来授予对封闭子程序中非局部变量的访问权限。回想一下,在第5章 中讨论了由此引起的问题。长期以来,唯一允许嵌套子程序的语言是直接从 ALGOL 60 衍生而来的语言,即 ALGOL 68、Pascal 和 Ada。许多其他语言(包括 C 的所有直接后代)都不允许子程序嵌套。最近,一些新语言再次允许它。其中包括 JavaScript、Python 和 Ruby。此外,大多数函数式编程语言都允许嵌套子程序。
The idea of nesting subprograms originated with ALGOL 60. The motivation was to be able to create a hierarchy of both logic and scopes. If a subprogram is needed only within another subprogram, why not place it there and hide it from the rest of the program? Because static scoping is usually used in languages that allow subprograms to be nested, this also provides a highly structured way to grant access to nonlocal variables in enclosing subprograms. Recall that in Chapter 5, the problems introduced by this were discussed. For a long time, the only languages that allowed nested subprograms were those directly descending from ALGOL 60, which were ALGOL 68, Pascal, and Ada. Many other languages, including all of the direct descendants of C, do not allow subprogram nesting. Recently, some new languages again allow it. Among these are JavaScript, Python, and Ruby. Also, most functional programming languages allow subprograms to be nested.
参数传递方法是将参数传输到被调用的子程序或从被调用的子程序传输参数的方式。首先,我们重点介绍参数传递方法的不同语义模型。然后,我们讨论语言设计者为这些语义模型发明的各种实现模型。接下来,我们调查几种语言的设计选择,并讨论用于实现这些实现模型的实际方法。最后,我们考虑语言设计者在选择这些方法时面临的设计考虑因素。
Parameter-passing methods are the ways in which parameters are transmitted to and/or from called subprograms. First, we focus on the different semantics models of parameter-passing methods. Then, we discuss the various implementation models invented by language designers for these semantics models. Next, we survey the design choices of several languages and discuss the actual methods used to implement the implementation models. Finally, we consider the design considerations that face a language designer in choosing among the methods.
形式参数具有以下三种不同的语义模型之一:(1)它们可以从相应的实际参数接收数据;(2)它们可以将数据传输到实际参数;或者(3)它们可以同时执行这两种操作。这些模型分别称为输入模式、输出模式和输入输出模式。例如,考虑一个接受两个int值数组作为参数的子程序list1——和list2。子程序必须将结果添加list1到并返回其修改版本。此外,子程序必须根据两个给定的数组创建一个新数组并返回它。对于这个子程序,应该是输入模式,因为子程序不能更改它。必须是输入输出模式,因为子程序需要数组的给定值并且必须返回其新值。第三个数组应该是输出模式,因为这个数组没有初始值并且必须将其计算值返回给调用者。list2list2list1list2
Formal parameters are characterized by one of three distinct semantics models: (1) They can receive data from the corresponding actual parameter; (2) they can transmit data to the actual parameter; or (3) they can do both. These models are called in mode, out mode, and inout mode, respectively. For example, consider a subprogram that takes two arrays of int values as parameters—list1 and list2. The subprogram must add list1 to list2 and return the result as a revised version of list2. Furthermore, the subprogram must create a new array from the two given arrays and return it. For this subprogram, list1 should be in mode, because it is not to be changed by the subprogram. list2 must be inout mode, because the subprogram needs the given value of the array and must return its new value. The third array should be out mode, because there is no initial value for this array and its computed value must be returned to the caller.
参数传输中数据传输方式有两种概念模型:要么复制实际值(复制到调用方、被调用方或双向),要么传输访问路径。最常见的是,访问路径是一个简单的指针或引用。图 9.1说明了复制值时参数传递的三种语义模型。
There are two conceptual models of how data transfers take place in parameter transmission: Either an actual value is copied (to the caller, to the called, or both ways) or an access path is transmitted. Most commonly, the access path is a simple pointer or reference. Figure 9.1 illustrates the three semantics models of parameter passing when values are copied.
语言设计者开发了多种模型来指导三种基本参数传输模式的实现。在以下章节中,我们将讨论其中的几种,以及它们的相对优缺点。
A variety of models have been developed by language designers to guide the implementation of the three basic parameter transmission modes. In the following sections, we discuss several of these, along with their relative strengths and weaknesses.
当按值传递参数时,实际参数的值用于初始化相应的形式参数,然后该形式参数作为子程序中的局部变量,从而实现模式内语义。
When a parameter is passed by value, the value of the actual parameter is used to initialize the corresponding formal parameter, which then acts as a local variable in the subprogram, thus implementing in-mode semantics.
按值传递通常通过复制来实现,因为使用这种方法访问通常更有效率。它可以通过将访问路径传输到调用方中实际参数的值来实现,但这要求该值位于写保护单元(只能读取的单元)中。强制写保护并不总是一件简单的事情。例如,假设传递参数的子程序又将其传递给另一个子程序。这是使用复制传输的另一个原因。正如我们将在第9.5.4节 中看到的,C++ 提供了一种方便有效的方法来指定通过访问路径传输的按值传递参数的写保护。
Pass-by-value is normally implemented by copy, because accesses often are more efficient with this approach. It could be implemented by transmitting an access path to the value of the actual parameter in the caller, but that would require that the value be in a write-protected cell (one that can only be read). Enforcing the write protection is not always a simple matter. For example, suppose the subprogram to which the parameter was passed passes it in turn to another subprogram. This is another reason to use copy transfer. As we will see in Section 9.5.4, C++ provides a convenient and effective method for specifying write protection on pass-by-value parameters that are transmitted by access path.
按值传递的优点是对于标量来说它很快,包括链接成本和访问时间。
The advantage of pass-by-value is that for scalars it is fast, in both linkage cost and access time.
如果使用复制,按值传递方法的主要缺点是形式参数需要额外的存储空间,要么在被调用的子程序中,要么在调用者和被调用子程序之外的某个区域中。此外,必须将实际参数复制到相应形式参数的存储区域中。如果参数很大(例如包含许多元素的数组),则存储和复制操作的成本可能很高。
The main disadvantage of the pass-by-value method if copies are used is that additional storage is required for the formal parameter, either in the called subprogram or in some area outside both the caller and the called subprogram. In addition, the actual parameter must be copied to the storage area for the corresponding formal parameter. The storage and the copy operations can be costly if the parameter is large, such as an array with many elements.
按结果传递是输出模式参数的一种实现模型。当按结果传递参数时,不会将任何值传送给子程序。相应的形式参数充当局部变量,但在控制权转回给调用者之前,其值被传回给调用者的实际参数,而实际参数显然必须是变量。(如果计算结果是文字或表达式,调用者如何引用它?)
Pass-by-result is an implementation model for out-mode parameters. When a parameter is passed by result, no value is transmitted to the subprogram. The corresponding formal parameter acts as a local variable, but just before control is transferred back to the caller, its value is transmitted back to the caller’s actual parameter, which obviously must be a variable. (How would the caller reference the computed result if it were a literal or an expression?)
按结果传递方法具有按值传递的优点和缺点,此外还有一些缺点。如果值是通过复制(而不是访问路径)返回的,就像通常那样,按结果传递还需要按值传递所需的额外存储和复制操作。与按值传递一样,通过传输访问路径来实现按结果传递的难度通常导致它通过复制来实现。在这种情况下,问题在于确保实际参数的初始值不被调用的子程序使用。
The pass-by-result method has the advantages and disadvantages of pass-by-value, plus some additional disadvantages. If values are returned by copy (as opposed to access paths), as they typically are, pass-by-result also requires the extra storage and the copy operations that are required by pass-by-value. As with pass-by-value, the difficulty of implementing pass-by-result by transmitting an access path usually results in it being implemented by copy. In this case, the problem is in ensuring that the initial value of the actual parameter is not used in the called subprogram.
传递结果模型的另一个问题是,可能会发生实际参数冲突,例如调用时创建的冲突
One additional problem with the pass-by-result model is that there can be an actual parameter collision, such as the one created with the call
sub(p1, p1)sub(p1, p1)
在 中sub,假设两个形式参数具有不同的名称,则显然可以为它们分配不同的值。然后,两者中最后复制到其对应实际参数的那个将成为p1调用者中的 的值。因此,实际参数的复制顺序决定了out它们的值。例如,考虑以下 C# 方法,该方法使用形式参数上的说明符指定传递结果方法。5
In sub, assuming the two formal parameters have different names, the two can obviously be assigned different values. Then, whichever of the two is copied to their corresponding actual parameter last becomes the value of p1 in the caller. Thus, the order in which the actual parameters are copied determines their value. For example, consider the following C# method, which specifies the pass-by-result method with the out specifier on its formal parameter.5
void Fixer(out int x, out int y) {
x = 17;
y = 35;
}
. . .
f.Fixer(out a, out a);void Fixer(out int x, out int y) {
x = 17;
y = 35;
}
. . .
f.Fixer(out a, out a);
如果在 执行结束时Fixer,形式参数x先赋值给其对应的实际参数,则a调用方中实际参数的值将为35。如果y先赋值,则a调用方中实际参数的值将为17。
If, at the end of the execution of Fixer, the formal parameter x is assigned to its corresponding actual parameter first, then the value of the actual parameter a in the caller will be 35. If y is assigned first, then the value of the actual parameter a in the caller will be 17.
由于顺序可能依赖于某些语言的实现,因此不同的实现可能会产生不同的结果。
Because the order can be implementation dependent for some languages, different implementations can produce different results.
调用具有两个相同实际参数的子程序在使用其他参数传递方法时也会导致不同类型的问题,如第9.5.2.4节 所述。
Calling a subprogram with two identical actual parameters can also lead to different kinds of problems when other parameter-passing methods are used, as discussed in Section 9.5.2.4.
传递结果可能出现的另一个问题是,实现者可能能够在两个不同的时间之间选择评估实际参数的地址:在调用时或在返回时。例如,考虑以下 C# 方法和以下代码:
Another problem that can occur with pass-by-result is that the implementor may be able to choose between two different times to evaluate the addresses of the actual parameters: at the time of the call or at the time of the return. For example, consider the following C# method and following code:
void DoIt(out int x, int index){
x = 17;
index = 42;
}
. . .
sub = 21;
f.DoIt(list[sub], sub);void DoIt(out int x, int index){
x = 17;
index = 42;
}
. . .
sub = 21;
f.DoIt(list[sub], sub);
地址list[sub]在方法的开始和结束之间会发生变化。实现者必须选择将此参数绑定到地址的时间 — 在调用时或返回时。如果地址是在方法入口处计算的,则该值17将返回到list[21];如果地址是在返回之前计算的,17则将返回到list[42]。这使得程序无法在选择在子程序开头评估输出模式参数地址的实现和选择在结尾进行该评估的实现之间移植。避免此问题的一个明显方法是让语言设计者指定何时必须计算用于返回参数值的地址。
The address of list[sub] changes between the beginning and end of the method. The implementor must choose the time to bind this parameter to an address—at the time of the call or at the time of the return. If the address is computed on entry to the method, the value 17 will be returned to list[21]; if computed just before return, 17 will be returned to list[42]. This makes programs unportable between an implementation that chooses to evaluate the addresses for out-mode parameters at the beginning of a subprogram and one that chooses to do that evaluation at the end. An obvious way to avoid this problem is for the language designer to specify when the address to be used to return the parameter value must be computed.
按值结果传递是输入输出模式参数的一种实现模型,其中复制实际值。它实际上是按值传递和按结果传递的结合。实际参数的值用于初始化对应的形式参数,然后充当局部变量。事实上,传递值结果的形式参数必须具有与被调用子程序关联的局部存储。在子程序终止时,形式参数的值被传回实际参数。
Pass-by-value-result is an implementation model for inout-mode parameters in which actual values are copied. It is in effect a combination of pass-by-value and pass-by-result. The value of the actual parameter is used to initialize the corresponding formal parameter, which then acts as a local variable. In fact, pass-by-value-result formal parameters must have local storage associated with the called subprogram. At subprogram termination, the value of the formal parameter is transmitted back to the actual parameter.
按值传递结果有时也称为按复制传递,因为实际参数在子程序入口处被复制到形式参数,然后在子程序终止时复制回来。
Pass-by-value-result is sometimes called pass-by-copy, because the actual parameter is copied to the formal parameter at subprogram entry and then copied back at subprogram termination.
按值结果传递与按值传递和按结果传递一样,都存在需要多次存储参数和复制值的时间等缺点。它与按结果传递一样,都存在与实际参数分配顺序相关的问题。
Pass-by-value-result shares with pass-by-value and pass-by-result the disadvantages of requiring multiple storage for parameters and time for copying values. It shares with pass-by-result the problems associated with the order in which actual parameters are assigned.
按值结果传递的优点是相对于按引用传递的,因此在第9.5.2.4节 中讨论。
The advantages of pass-by-value-result are relative to pass-by-reference, so they are discussed in Section 9.5.2.4.
按引用传递是输入输出模式参数的第二种实现模型。但是,按引用传递方法并不像按值结果传递那样来回复制数据值,而是将访问路径(通常只是一个地址)传输给被调用的子程序。这提供了存储实际参数的单元的访问路径。因此,被调用的子程序可以访问调用程序单元中的实际参数。实际上,实际参数与被调用的子程序共享。
Pass-by-reference is a second implementation model for inout-mode parameters. Rather than copying data values back and forth, however, as in pass-by-value-result, the pass-by-reference method transmits an access path, usually just an address, to the called subprogram. This provides the access path to the cell storing the actual parameter. Thus, the called subprogram is allowed to access the actual parameter in the calling program unit. In effect, the actual parameter is shared with the called subprogram.
引用传递的优点是传递过程本身非常高效,无论是在时间上还是在空间上。不需要重复的空间,也不需要复制。
The advantage of pass-by-reference is that the passing process itself is efficient, in terms of both time and space. Duplicate space is not required and no copying is required.
但是,引用传递方法有几个缺点。首先,由于需要额外的间接寻址级别,因此访问形式参数的速度将比值传递参数慢。6其次,如果只需要与被调用的子程序进行单向通信,则可能会对实际参数进行无意和错误的更改。下面将讨论此问题。
There are, however, several disadvantages to the pass-by-reference method. First, access to the formal parameters will be slower than pass-by-value parameters, because of the additional level of indirect addressing that is required.6 Second, if only one-way communication to the called subprogram is required, inadvertent and erroneous changes may be made to the actual parameter. This issue is addressed below.
传递引用的另一个问题是可以创建别名。这个问题是可以预料到的,因为传递引用使访问路径可供被调用的子程序使用,从而提供对非局部变量的访问。这类别名的问题与其他情况下的问题相同:它损害了可读性,从而损害了可靠性。它还使程序验证更加困难。传递引用的另一个问题是被调用的子程序是否被允许更改传递的指针。在 C 中,这是可能的,但在其他一些语言中,如 Pascal 和 C++,作为地址的形式参数在被调用的子程序中被隐式取消引用,从而阻止了这种更改。
Another problem of pass-by-reference is that aliases can be created. This problem should be expected, because pass-by-reference makes access paths available to the called subprograms, thereby providing access to nonlocal variables. The problem with these kinds of aliasing is the same as in other circumstances: It is harmful to readability and thus to reliability. It also makes program verification more difficult. Another issue with pass by reference is whether the called subprogram is allowed to change a passed pointer. In C, this is possible, but in some other languages, such as Pascal and C++, formal parameters that are addresses are implicitly dereferenced in the called subprogram, which prevents such changes.
传递引用参数有多种方式可以创建别名。首先,实际参数之间可能会发生冲突。考虑一个 C++ 函数,它有两个要通过引用传递的参数,如下所示
There are several ways pass-by-reference parameters can create aliases. First, collisions can occur between actual parameters. Consider a C++ function that has two parameters that are to be passed by reference, as in
void fun(int &first, int &second)void fun(int &first, int &second)
如果调用恰好fun传递了同一个变量两次,例如
If the call to fun happens to pass the same variable twice, as in
fun(total, total)fun(total, total)
thenfirst和secondinfun将是别名。
then first and second in fun will be aliases.
其次,数组元素之间的冲突也会导致别名。例如,假设fun使用两个用变量下标指定的数组元素调用该函数,如下所示
Second, collisions between array elements can also cause aliases. For example, suppose the function fun is called with two array elements that are specified with variable subscripts, as in
fun(list[i], list[j])fun(list[i], list[j])
如果这两个参数通过引用传递且i恰好等于j,那么first和second又是别名。
If these two parameters are passed by reference and i happens to be equal to j, then first and second are again aliases.
第三,如果子程序的两个形式参数是数组的元素和整个数组,并且都是通过引用传递的,那么如下调用
Third, if two of the formal parameters of a subprogram are an element of an array and the whole array, and both are passed by reference, then a call such as
fun1(list[i], list)fun1(list[i], list)
可能会导致中的别名fun1,因为fun1可以通过第二个参数访问的所有元素list,并通过其第一个参数访问单个元素。
could result in aliasing in fun1, because fun1 can access all elements of list through the second parameter and access a single element through its first parameter.
使用传递引用参数获取别名的另一种方法是通过形式参数和可见的非局部变量之间的冲突。例如,考虑以下 C 代码:
Still another way to get aliasing with pass-by-reference parameters is through collisions between formal parameters and nonlocal variables that are visible. For example, consider the following C code:
int * global;
void main() {
. . .
sub(global);
. . .
}
void sub(int * param) {
. . .
}int * global;
void main() {
. . .
sub(global);
. . .
}
void sub(int * param) {
. . .
}
其中sub,param和global是别名。
Inside sub, param and global are aliases.
如果使用按值传递结果而不是按引用传递结果,所有这些可能的别名情况都会被消除。但是,如果不使用别名,有时会出现其他问题,如第9.5.2.3节 所述。
All these possible aliasing situations are eliminated if pass-by-value-result is used instead of pass-by-reference. However, in place of aliasing, other problems sometimes arise, as discussed in Section 9.5.2.3.
按名称传递是一种输入输出模式的参数传输方法,不对应于单一的实现模型。当按名称传递参数时,实际参数实际上在文本上替代了子程序中所有出现的相应形式参数。这种方法与迄今为止讨论的方法完全不同;在这种情况下,形式参数在子程序调用时绑定到实际值或地址。按名称传递的形式参数在子程序调用时绑定到访问方法,但实际绑定到值或地址要延迟到形式参数被分配或引用之后。实现按名称传递参数需要将子程序传递给被调用的子程序以评估形式参数的地址或值。还必须传递传递的子程序的引用环境。这个子程序/引用环境是一个闭包(参见第9.12节 )。7按名称传递参数实现起来很复杂,而且效率低下。它们还增加了程序的复杂性,从而降低了程序的可读性和可靠性。
Pass-by-name is an inout-mode parameter transmission method that does not correspond to a single implementation model. When parameters are passed by name, the actual parameter is, in effect, textually substituted for the corresponding formal parameter in all its occurrences in the subprogram. This method is quite different from those discussed thus far; in which case, formal parameters are bound to actual values or addresses at the time of the subprogram call. A pass-by-name formal parameter is bound to an access method at the time of the subprogram call, but the actual binding to a value or an address is delayed until the formal parameter is assigned or referenced. Implementing a pass-by-name parameter requires a subprogram to be passed to the called subprogram to evaluate the address or value of the formal parameter. The referencing environment of the passed subprogram must also be passed. This subprogram/referencing environment is a closure (see Section 9.12).7 Pass-by-name parameters are both complex to implement and inefficient. They also add significant complexity to the program, thereby lowering its readability and reliability.
由于 pass-by-name 不是任何广泛使用的语言的一部分,因此这里不再进一步讨论。但是,它在编译时由汇编语言中的宏使用,并用于 C++、Java 5.0 和 C# 2005 中的泛型子程序的泛型参数,如第9.9节 所述。
Because pass-by-name is not part of any widely used language, it is not discussed further here. However, it is used at compile time by the macros in assembly languages and for the generic parameters of the generic subprograms in C++, Java 5.0, and C# 2005, as discussed in Section 9.9.
我们现在来解决参数传递的各种实现模型实际上是如何实现的问题。
We now address the question of how the various implementation models of parameter passing are actually implemented.
在大多数现代语言中,参数通信都是通过运行时堆栈进行的。运行时堆栈由管理程序执行的运行时系统初始化和维护。运行时堆栈广泛用于子程序控制链接和参数传递,如第10章 所述。在下面的讨论中,我们假设堆栈用于所有参数传输。
In most contemporary languages, parameter communication takes place through the run-time stack. The run-time stack is initialized and maintained by the run-time system, which manages the execution of programs. The run-time stack is used extensively for subprogram control linkage and parameter passing, as discussed in Chapter 10. In the following discussion, we assume that the stack is used for all parameter transmission.
按值传递参数将其值复制到堆栈位置。然后,堆栈位置用作相应形式参数的存储。按结果传递参数的实现方式与按值传递相反。分配给按结果传递实际参数的值放置在堆栈中,调用程序单元可以在被调用子程序终止时检索它们。按值传递结果参数可以直接从其语义中实现为按值传递和按结果传递的组合。此类参数的堆栈位置由调用初始化,然后在被调用子程序中用作局部变量。
Pass-by-value parameters have their values copied into stack locations. The stack locations then serve as storage for the corresponding formal parameters. Pass-by-result parameters are implemented as the opposite of pass-by-value. The values assigned to the pass-by-result actual parameters are placed in the stack, where they can be retrieved by the calling program unit upon termination of the called subprogram. Pass-by-value-result parameters can be implemented directly from their semantics as a combination of pass-by-value and pass-by-result. The stack location for such a parameter is initialized by the call and is then used like a local variable in the called subprogram.
按引用传递参数可能是最简单的实现方式。大多数语言只允许按引用传递变量。但是,Fortran 按引用传递所有形式的参数。在 Fortran 中,无论实际参数的类型如何,只需将其地址放入堆栈中。对于文字,文字的地址放在堆栈中。对于表达式,编译器必须构建代码来评估表达式,该代码必须在将控制权转移到被调用的子程序之前执行。然后将代码放置其评估结果的内存单元的地址放入堆栈中。Fortran 编译器必须防止被调用的子程序更改文字或表达式形式的参数。
Pass-by-reference parameters are perhaps the simplest to implement. Most languages only allow variables to be passed by reference. However, Fortran passes all forms of parameters by reference. In Fortran, regardless of the type of the actual parameter, only its address must be placed in the stack. In the case of literals, the address of the literal is put in the stack. In the case of an expression, the compiler must build code to evaluate the expression, which must be executed just before the transfer of control to the called subprogram. The address of the memory cell in which the code places the result of its evaluation is then put in the stack. The Fortran compiler must prevent the called subprogram from changing parameters that are literals or expressions.
被调用子程序中的形式参数的访问是通过从地址的堆栈位置进行间接寻址来实现的。图 9.2显示了使用运行时堆栈的按值传递、按结果传递、按值结果传递和按引用传递的实现。子程序通过 callsub调用,其中按值传递、按结果传递、按值结果传递和按引用传递。mainsub(w, x, y, z)wxyz
Access to the formal parameters in the called subprogram is by indirect addressing from the stack location of the address. The implementation of pass-by-value, -result, -value-result, and -reference, where the run-time stack is used, is shown in Figure 9.2. Subprogram sub is called from main with the call sub(w, x, y, z), where w is passed by value, x is passed by result, y is passed by value-result, and z is passed by reference.
函数头:函数调用:(按值传递、按结果传递、按值-结果传递、按引用传递)void sub (int a, int b, int c, int d)mainsub (w,x,y,z)wxyz
Function header: void sub (int a, int b, int c, int d)
Function call in main: sub (w,x,y,z)
(pass w by value, x by result, y by value-result, z by reference)
C 使用按值传递。按引用传递(输入输出模式)语义是通过使用指针作为参数来实现的。指针的值可供被调用函数使用,并且不会复制回任何内容。但是,由于传递的是调用方数据的访问路径,因此被调用函数可以更改调用方的数据。但对指针形式参数的所有引用都必须在函数中明确取消引用。C 从 ALGOL 68 复制了这种按值传递方法的使用。在 C 和 C++ 中,形式参数都可以作为指向常量的指针。相应的实际参数不必是常量,因为在这种情况下它们会被强制转换为常量。这允许指针参数提供按引用传递的效率和按值传递的单向语义。在被调用函数中隐式指定了这些参数的写保护。
C uses pass-by-value. Pass-by-reference (inout mode) semantics is achieved by using pointers as parameters. The value of the pointer is made available to the called function and nothing is copied back. However, because what was passed is an access path to the data of the caller, the called function can change the caller’s data. But all references to pointer formal parameter must be explicitly dereferenced in the function. C copied this use of the pass-by-value method from ALGOL 68. In both C and C++, formal parameters can be typed as pointers to constants. The corresponding actual parameters need not be constants, for in such cases they are coerced to constants. This allows pointer parameters to provide the efficiency of pass-by-reference with the one-way semantics of pass-by-value. Write protection of those parameters in the called function is implicitly specified.
C++ 包含一种特殊的指针类型,称为引用类型,如第6章 所述,它通常用于参数。引用参数在函数或方法中隐式取消引用,其语义是按引用传递。C++ 还允许将引用参数定义为常量。例如,我们可以有
C++ includes a special pointer type, called a reference type, as discussed in Chapter 6, which is often used for parameters. Reference parameters are implicitly dereferenced in the function or method, and their semantics is pass-by-reference. C++ also allows reference parameters to be defined to be constants. For example, we could have
void fun(const int &p1, int p2, int &p3) { . . . }void fun(const int &p1, int p2, int &p3) { . . . }
其中p1是按引用传递但不能在函数 中更改fun,p2是按值传递,p3是按引用传递。 和p1都不p3需要显式取消引用fun。
where p1 is pass-by-reference but cannot be changed in the function fun, p2 is pass-by-value, and p3 is pass-by-reference. Neither p1 nor p3 need be explicitly dereferenced in fun.
ALGOL 60 引入了按名称传递方法。它还允许按值传递作为选项。主要由于实现起来很困难,按名称传递参数没有从 ALGOL 60 延续到任何后来流行的语言(SIMULA 67 除外)。
ALGOL 60 introduced the pass-by-name method. It also allowed pass-by-value as an option. Primarily because of the difficulty in implementing them, pass-by-name parameters were not carried from ALGOL 60 to any subsequent languages that became popular (other than SIMULA 67).
常量参数和模式内参数并不完全相同。常量参数显然以模式实现。但是,在除 Ada 之外的所有常见命令式语言中,模式内参数可以在子程序中赋值,即使这些更改从未反映在相应实际参数的值中。在 Ada 中,这种赋值是非法的。常量参数永远不能被赋值。
Constant parameters and in-mode parameters are not exactly alike. Constant parameters clearly implement in mode. However, in all of the common imperative languages except Ada, in-mode parameters can be assigned in the subprogram even though those changes are never reflected in the values of the corresponding actual parameters. In Ada, such an assignment is illegal. Constant parameters can never be assigned.
与 C 和 C++ 一样,所有 Java 参数都是按值传递的。但是,由于只能通过引用变量访问对象,因此对象参数实际上是按引用传递的。虽然作为参数传递的对象引用本身不能在被调用的子程序中更改,但如果有方法可以引起更改,则可以更改引用的对象。由于引用变量不能直接指向标量变量,并且 Java 没有指针,因此标量不能在 Java 中按引用传递(尽管对包含标量的对象的引用可以)。因此,如果将标量传递给 Java 方法,则该方法不能更改它。
As with C and C++, all Java parameters are passed by value. However, because objects can be accessed only through reference variables, object parameters are in effect passed by reference. Although an object reference passed as a parameter cannot itself be changed in the called subprogram, the referenced object can be changed if a method is available to cause the change. Because reference variables cannot point to scalar variables directly and Java does not have pointers, scalars cannot be passed by reference in Java (although a reference to an object that contains a scalar can). Therefore, if a scalar is passed to a Java method, it cannot be changed by that method.
ALGOL W(Wirth 和 Hoare,1966)引入了按值结果传递的参数传递方法,以解决按名称传递的低效率和按引用传递的问题。
ALGOL W (Wirth and Hoare, 1966) introduced the pass-by-value-result method of parameter passing as an alternative to the inefficiency of pass-by-name and the problems of pass-by-reference.
C# 默认的参数传递方式是按值传递。可以通过在形式参数和其对应的实际参数前加上 来指定按引用传递ref。例如,考虑以下 C# 骨架方法和调用:
The default parameter-passing method of C# is pass-by-value. Pass-by-reference can be specified by preceding both a formal parameter and its corresponding actual parameter with ref. For example, consider the following C# skeletal method and call:
void sumer(ref int oldSum, int newOne) { . . . }
. . .
sumer(ref sum, newValue);void sumer(ref int oldSum, int newOne) { . . . }
. . .
sumer(ref sum, newValue);
第一个参数sumer通过引用传递;第二个参数通过值传递。所有ref参数在传递给实际参数之前都必须赋值。
The first parameter to sumer is passed by reference; the second is passed by value. All ref parameters must be assigned a value before they are passed to an actual parameter.
C# 还支持外模式参数,即不需要初始值的传递引用参数。此类参数在形式参数列表中使用修饰符指定out。
C# also supports out-mode parameters, which are pass-by-reference parameters that do not need initial values. Such parameters are specified in the formal parameter list with the out modifier.
PHP 的参数传递与 C# 类似,不同之处在于实际参数或形式参数都可以指定按引用传递。按引用传递是通过在一个或两个参数前加上与符号来指定的。
PHP’s parameter passing is similar to that of C#, except that either the actual parameter or the formal parameter can specify pass-by-reference. Pass-by-reference is specified by preceding one or both of the parameters with an ampersand.
在 Swift 中,默认的参数传递方式是按值传递,以这种方式传递的形式参数在被调用的子程序中无法更改。可以通过在形式参数前加上保留字 来指定按引用传递语义inout。
In Swift, the default parameter passing method is pass by value, and formal parameters passed this way cannot be changed in the called subprogram. Pass-by-reference semantics can be specified by preceding the formal parameter with the reserved word inout.
Perl 使用一种原始的传递参数的方法。所有实际参数都隐式地放置在一个预定义的数组中@_(真是这样!)。子程序从这个数组中检索实际参数值(或地址)。这个数组最奇特的地方是它的神奇性质,由事实上,它的元素实际上是实际参数的别名。因此,如果@_在被调用的子程序中更改了元素,则该更改将反映在调用中相应的实际参数中,假设存在相应的实际参数(实际参数的数量不必与形式参数的数量相同)并且它是一个变量。
Perl employs a primitive means of passing parameters. All actual parameters are implicitly placed in a predefined array named @_ (of all things!). The subprogram retrieves the actual parameter values (or addresses) from this array. The most peculiar thing about this array is its magical nature, exposed by the fact that its elements are in effect aliases for the actual parameters. Therefore, if an element of @_ is changed in the called subprogram, that change is reflected in the corresponding actual parameter in the call, assuming there is a corresponding actual parameter (the number of actual parameters need not be the same as the number of formal parameters) and it is a variable.
Python 和 Ruby 的参数传递方法称为赋值传递或共享传递。由于所有数据值都是对象,因此每个变量都是对象的引用。在赋值传递中,实际参数值被赋给形式参数。因此,赋值传递实际上是引用传递,因为所有实际参数的值都是引用。但是,只有在某些情况下,这才会导致引用传递语义。例如,许多对象本质上是不可变的。在纯面向对象语言中,使用赋值语句更改变量值的过程,如
The parameter-passing method of Python and Ruby is called pass-by-assignment or pass-by-sharing. Because all data values are objects, every variable is a reference to an object. In pass-by-assignment, the actual parameter value is assigned to the formal parameter. Therefore, pass-by-assignment is in effect pass-by-reference, because the value of all actual parameters are references. However, only in certain cases does this result in pass-by-reference semantics. For example, many objects are essentially immutable. In a pure object-oriented language, the process of changing the value of a variable with an assignment statement, as in
x = x + 1x = x + 1
不会改变 所引用的对象x。相反,它获取 所引用的对象x,将其增加1,从而创建一个新对象(其值为x + 1),然后更改x为引用新对象。因此,当将对标量对象的引用传递给子程序时,无法就地更改所引用的对象。由于引用是通过值传递的,因此即使在子程序中更改了形式参数,该更改也不会影响调用方中的实际参数。
does not change the object referenced by x. Rather, it takes the object referenced by x, increments it by 1, thereby creating a new object (with the value x + 1), and then changes x to reference the new object. So, when a reference to a scalar object is passed to a subprogram, the object being referenced cannot be changed in place. Because the reference is passed by value, even though the formal parameter is changed in the subprogram, that change has no effect on the actual parameter in the caller.
现在,假设将对数组的引用作为参数传递。如果将相应的形式参数赋给一个新的数组对象,则对调用者没有影响。但是,如果使用形式参数将值赋给数组的元素,例如
Now, suppose a reference to an array is passed as a parameter. If the corresponding formal parameter is assigned a new array object, there is no effect on the caller. However, if the formal parameter is used to assign a value to an element of the array, as in
list[3] = 47list[3] = 47
实际参数会受到影响。因此,更改形式参数的引用对调用者没有影响,但更改作为参数传递的数组元素则有影响。
the actual parameter is affected. So, changing the reference of the formal parameter has no effect on the caller, but changing an element of the array that is passed as a parameter does.
现在,人们普遍认为,软件可靠性要求检查实际参数的类型是否与相应的形式参数的类型一致。如果没有这种类型检查,小的打字错误可能会导致程序错误,而这些错误可能很难诊断,因为它们不会被编译器或运行时系统检测到。例如,在函数调用中
It is now widely accepted that software reliability demands that the types of actual parameters be checked for consistency with the types of the corresponding formal parameters. Without such type checking, small typographical errors can lead to program errors that may be difficult to diagnose because they are not detected by the compiler or the run-time system. For example, in the function call
result = sub1(1)result = sub1(1)
实际参数是整型常量。如果的形式参数sub1是浮点型,则不进行参数类型检查就不会检测到错误。虽然整数 1 和浮点 1 具有相同的值,但这两者的表示非常不同。sub1如果它期望一个浮点值,则无法在给定整数实际参数值的情况下产生正确的结果。
the actual parameter is an integer constant. If the formal parameter of sub1 is a floating-point type, no error will be detected without parameter type checking. Although an integer 1 and a floating-point 1 have the same value, the representations of these two are very different. sub1 cannot produce a correct result given an integer actual parameter value if it expects a floating-point value.
早期的编程语言(例如 Fortran 77 和 C 的原始版本)不需要参数类型检查;大多数后期语言都需要它。但是,相对较新的语言 Perl、JavaScript 和 PHP 则不需要。
Early programming languages, such as Fortran 77 and the original version of C, did not require parameter type checking; most later languages require it. However, the relatively recent languages Perl, JavaScript, and PHP do not.
C 和 C++ 需要对参数类型检查进行一些特殊讨论。在原始 C 中,既不检查参数的数量,也不检查参数的类型。在 C89 中,函数的形式参数可以用两种方式定义。它们可以像在原始 C 中一样定义;也就是说,参数的名称列在括号中,后面是它们的类型声明,如以下函数所示:
C and C++ require some special discussion in the matter of parameter type checking. In the original C, neither the number of parameters nor their types were checked. In C89, the formal parameters of functions can be defined in two ways. They can be defined as in the original C; that is, the names of the parameters are listed in parentheses and the type declarations for them follow, as in the following function:
double sin(x)
double x;
{ . . . }double sin(x)
double x;
{ . . . }
使用这种形式可以避免类型检查,从而允许如下调用
Using this form avoids type checking, thereby allowing calls such as
double value;
int count;
. . .
value = sin(count);
double value;
int count;
. . .
value = sin(count);
是合法的,尽管它们永远都不正确。
to be legal, although they are never correct.
原始 C 定义方法的替代方法称为原型方法,其中形式参数类型包含在列表中,如下所示
The alternative to the original C definition approach is called the prototype method, in which the formal parameter types are included in the list, as in
double sin(double x)
{ . . . }double sin(double x)
{ . . . }
如果sin用同样的调用方式来调用此版本的,即使用以下命令,那么它也是合法的:
If this version of sin is called with the same call, that is, with the following, it is also legal:
value = sin(count);value = sin(count);
将实际参数 ( ) 的类型int与形式参数 ( double) 的类型进行检查。尽管它们不匹配,但int可以强制转换为double(这是一种扩展强制),因此可以进行转换。如果无法进行转换(例如,如果实际参数是数组)或参数数量错误,则会检测到语义错误。因此在 C89 中,用户可以选择是否对参数进行类型检查。
The type of the actual parameter (int) is checked against that of the formal parameter (double). Although they do not match, int is coercible to double (it is a widening coercion), so the conversion is done. If the conversion is not possible (for example, if the actual parameter had been an array) or if the number of parameters is wrong, then a semantics error is detected. So in C89, the user chooses whether parameters are to be type checked.
在 C99 和 C++ 中,所有函数的形式参数都必须采用原型形式。但是,可以通过将参数列表的最后一部分替换为省略号来避免对某些参数进行类型检查,例如
In C99 and C++, all functions must have their formal parameters in prototype form. However, type checking can be avoided for some of the parameters by replacing the last part of the parameter list with an ellipsis, as in
int printf(const char* format_string, . . .);int printf(const char* format_string, . . .);
调用printf必须至少包含一个参数,即指向文字字符串的指针。除此之外,任何参数(包括空参数)都是合法的。方法printf 确定是否有附加参数是通过字符串参数中是否存在格式代码来确定的。例如,整数输出的格式代码是%d。这显示为字符串的一部分,如下所示:
A call to printf must include at least one parameter, a pointer to a literal character string. Beyond that, anything (including nothing) is legal. The way printf determines whether there are additional parameters is by the presence of format codes in the string parameter. For example, the format code for integer output is %d. This appears as part of the string, as in the following:
printf("The sum is %d\n", sum);printf("The sum is %d\n", sum);
告诉%函数printf还有一个参数。
The % tells the printf function that there is one more parameter.
当基本类型可以通过引用传递时,实际参数到形式参数的强制转换还存在一个更有趣的问题,就像在 C# 中一样。假设对方法的调用将一个float值传递给double形式参数。如果此参数是通过值传递的,则该float值将被强制转换double,并且不会出现任何问题。这种特殊的强制转换非常有用,因为它允许库提供可用于float和double值的双重子程序版本。但是,假设参数是通过引用传递的。当double形式参数的值返回给float调用方中的实际参数时,该值将溢出其位置。为了避免此问题,C# 要求实际参数的类型与其ref对应的形式参数的类型完全匹配(不允许强制转换)。
There is one more interesting issue with actual to formal parameter coercions when primitives can be passed by reference, as in C#. Suppose a call to a method passes a float value to a double formal parameter. If this parameter is passed by value, the float value is coerced to double and there is no problem. This particular coercion is very useful, for it allows a library to provide double versions of subprograms that can be used for both float and double values. However, suppose the parameter is passed by reference. When the value of the double formal parameter is returned to the float actual parameter in the caller, the value will overflow its location. To avoid this problem, C# requires the type of a ref actual parameter to match exactly the type of its corresponding formal parameter (no coercion is allowed).
在 Python 和 Ruby 中,没有对参数进行类型检查,因为在这些语言中类型是一个不同的概念。对象有类型,但变量没有,所以形式参数是无类型的。这不允许对参数进行类型检查。
In Python and Ruby, there is no type checking of parameters, because typing in these languages is a different concept. Objects have types, but variables do not, so formal parameters are typeless. This disallows the very idea of type checking parameters.
第 6章详细讨论了用于将多维数组元素引用的索引值映射到内存中地址的存储映射函数。在某些语言中,例如 C 和 C++,当将多维数组作为参数传递给子程序时,编译器必须能够在仅查看子程序(而不是调用子程序)的文本的情况下为该数组构建映射函数。这是因为子程序可以与调用它们的程序分开编译。考虑将矩阵传递给 C 语言函数的问题。C 语言中的多维数组实际上是数组的数组,它们按行主序存储。以下是当所有索引的下限为 0 且元素大小为 1 时矩阵的行主序存储映射函数:
The storage-mapping functions that are used to map the index values of references to elements of multidimensional arrays to addresses in memory were discussed at length in Chapter 6. In some languages, such as C and C++, when a multidimensional array is passed as a parameter to a subprogram, the compiler must be able to build the mapping function for that array while seeing only the text of the subprogram (not the calling subprogram). This is true because the subprograms can be compiled separately from the programs that call them. Consider the problem of passing a matrix to a function in C. Multidimensional arrays in C are really arrays of arrays, and they are stored in row major order. Following is a storage-mapping function for row major order for matrices when the lower bound of all indices is 0 and the element size is 1:
地址(mat[i, j]) =地址(mat[0,0]) + i*
列数+ j
address (mat[i, j]) = address(mat[0,0]) + i *
number_of_columns + j
请注意,此映射函数需要列数而不是行数。因此,在 C 和 C++ 中,当矩阵作为参数传递时,形式参数必须在第二对括号中包含列数。以下骨架 C 程序说明了这一点:
Notice that this mapping function needs the number of columns but not the number of rows. Therefore, in C and C++, when a matrix is passed as a parameter, the formal parameter must include the number of columns in the second pair of brackets. This is illustrated in the following skeletal C program:
void fun(int matrix[][10]) {
. . . }
void main() {
int mat[5][10];
. . .
fun(mat);
. . .
}void fun(int matrix[][10]) {
. . . }
void main() {
int mat[5][10];
. . .
fun(mat);
. . .
}
这种将矩阵作为参数传递的方法的问题在于,它不允许程序员编写可以接受具有不同列数的矩阵的函数;必须为每个具有不同列数的矩阵编写一个新函数。这实际上不允许编写灵活的函数,如果这些函数处理多维数组,则可能有效地重复使用。在 C 和 C++ 中,由于包含指针算法,因此有一种方法可以解决这个问题。矩阵可以作为指针传递,矩阵的实际维度也可以作为参数传递。然后,每次必须引用矩阵的元素时,函数都可以使用指针算法来评估用户编写的存储映射函数。例如,考虑以下函数原型:
The problem with this method of passing matrices as parameters is that it does not allow a programmer to write a function that can accept matrices with different numbers of columns; a new function must be written for every matrix with a different number of columns. This, in effect, disallows writing flexible functions that may be effectively reusable if the functions deal with multidimensional arrays. In C and C++, there is a way around the problem because of their inclusion of pointer arithmetic. The matrix can be passed as a pointer, and the actual dimensions of the matrix also can be passed as parameters. Then, the function can evaluate the user-written storage-mapping function using pointer arithmetic each time an element of the matrix must be referenced. For example, consider the following function prototype:
void fun(float *mat_ptr,
int num_rows,
int num_cols);void fun(float *mat_ptr,
int num_rows,
int num_cols);
下面的语句可用于将变量的值移动x到 中[row][col]的参数矩阵的元素中fun:
The following statement can be used to move the value of the variable x to the [row][col] element of the parameter matrix in fun:
*(mat_ptr + (row * num_cols) + col) = x;*(mat_ptr + (row * num_cols) + col) = x;
虽然这个方法可行,但显然很难阅读,而且由于其复杂性,很容易出错。可以使用宏来定义存储映射函数来缓解阅读困难,例如
Although this works, it is obviously difficult to read, and because of its complexity, it is error prone. The difficulty with reading this can be alleviated by using a macro to define the storage-mapping function, such as
#define mat_ptr(r,c) (*mat_ptr + ((r) *
(num_cols) + (c)))#define mat_ptr(r,c) (*mat_ptr + ((r) *
(num_cols) + (c)))
这样,任务就可以写成
With this, the assignment can be written as
mat_ptr(row,col) = x;mat_ptr(row,col) = x;
其他语言使用不同的方法来处理传递多维数组的问题。
Other languages use different approaches to dealing with the problem of passing multidimensional arrays.
在 Java 和 C# 中,数组是对象。它们都是单维的,但元素可以是数组。每个数组都继承一个命名常量(length在 Java 和LengthC# 中),该常量在创建数组对象时设置为数组的长度。矩阵的形式参数以两组空括号出现,如以下 Java 方法所示:
In Java and C#, arrays are objects. They are all single dimensioned, but the elements can be arrays. Each array inherits a named constant (length in Java and Length in C#) that is set to the length of the array when the array object is created. The formal parameter for a matrix appears with two sets of empty brackets, as in the following Java method:
float sumer(float mat[][]) {
float sum = 0.0f;
for (int row = 0; row < mat.length; row++) {
for (int col = 0; col < mat[row].length; col++) {
sum += mat[row][col];
} //** for (int row . . .
} //** for (int col . . .
return sum;
}float sumer(float mat[][]) {
float sum = 0.0f;
for (int row = 0; row < mat.length; row++) {
for (int col = 0; col < mat[row].length; col++) {
sum += mat[row][col];
} //** for (int row . . .
} //** for (int col . . .
return sum;
}
因为每个数组都有自己的长度值,所以矩阵中的行可以有不同的长度。
Because each array has its own length value, in a matrix the rows can have different lengths.
选择参数传递方法时需要考虑两个重要因素:效率以及是否需要单向或双向数据传输。
Two important considerations are involved in choosing parameter-passing methods: efficiency and whether one-way or two-way data transfer is needed.
当代软件工程原则规定应尽量减少子程序代码对子程序外部数据的访问。考虑到这一目标,当不通过参数向调用方返回任何数据时,应使用输入模式参数。当不向被调用的子程序传输任何数据但子程序必须将数据传回调用方时,应使用输出模式参数。最后,仅当数据必须在调用方和被调用的子程序之间双向移动时,才应使用输入输出模式参数。
Contemporary software-engineering principles dictate that access by subprogram code to data outside the subprogram should be minimized. With this goal in mind, in-mode parameters should be used whenever no data are to be returned through parameters to the caller. Out-mode parameters should be used when no data are transferred to the called subprogram but the subprogram must transmit data back to the caller. Finally, inout-mode parameters should be used only when data must move in both directions between the caller and the called subprogram.
有一个实际考虑与此原则相冲突。有时传递单向参数传输的访问路径是合理的。例如,当要将一个大数组传递给不修改它的子程序时,可能首选单向方法。但是,按值传递需要将整个数组移动到子程序的本地存储区域。这会在时间和空间上都代价高昂。因此,大数组通常通过引用传递。这正是 Ada 83 定义允许实现者在两种结构化参数方法之间进行选择的原因。C++ 常量引用参数提供了另一种解决方案。另一种替代方法是允许用户在两种方法之间进行选择。
There is a practical consideration that is in conflict with this principle. Sometimes it is justifiable to pass access paths for one-way parameter transmission. For example, when a large array is to be passed to a subprogram that does not modify it, a one-way method may be preferred. However, pass-by-value would require that the entire array be moved to a local storage area of the subprogram. This would be costly in both time and space. Because of this, large arrays are often passed by reference. This is precisely the reason why the Ada 83 definition allowed implementors to choose between the two methods for structured parameters. C++ constant reference parameters offer another solution. Another alternative approach would be to allow the user to choose between the methods.
函数参数传递方法的选择与另一个设计问题有关:函数副作用。该问题将在第9.10节 中讨论。
The choice of a parameter-passing method for functions is related to another design issue: functional side effects. This issue is discussed in Section 9.10.
考虑以下 C 函数:
Consider the following C function:
void swap1(int a, int b) {
int temp = a;
a = b;
b = temp;
}void swap1(int a, int b) {
int temp = a;
a = b;
b = temp;
}
假设这个函数被调用
Suppose this function is called with
swap1(c, d);swap1(c, d);
回想一下,C 使用按值传递。 的操作swap1可以用以下伪代码来描述:
Recall that C uses pass-by-value. The actions of swap1 can be described by the following pseudocode:
a = c - Move first parameter value in
b = d - Move second parameter value in
temp = a
a = b
b = temp
a = c - Move first parameter value in
b = d - Move second parameter value in
temp = a
a = b
b = temp
尽管a最终结果是d的值并且b最终结果是的值,但是和c的值保持不变,因为没有任何内容被传回给调用者。cd
Although a ends up with d’s value and b ends up with c’s value, the values of c and d are unchanged because nothing is transmitted back to the caller.
我们可以修改C的swap函数来处理指针参数,达到引用传递的效果:
We can modify the C swap function to deal with pointer parameters to achieve the effect of pass-by-reference:
void swap2(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}void swap2(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}
swap2可以这样调用
swap2 can be called with
swap2(&c, &d);swap2(&c, &d);
的动作swap2可以用以下内容来描述:
The actions of swap2 can be described with the following:
a = &c - Move first parameter address in
b = &d - Move second parameter address in
temp = *a
*a = *b
*b = temp
a = &c - Move first parameter address in
b = &d - Move second parameter address in
temp = *a
*a = *b
*b = temp
在这种情况下,交换操作成功:c和的值d实际上是互换的。swap2可以使用引用参数在 C++ 中编写如下:
In this case, the swap operation is successful: The values of c and d are in fact interchanged. swap2 can be written in C++ using reference parameters as follows:
void swap2(int &a, int &b) {
int temp = a;
a = b;
b = temp;
}void swap2(int &a, int &b) {
int temp = a;
a = b;
b = temp;
}
这种简单的交换操作在 Java 中是无法实现的,因为它既没有指针,也没有 C++ 的引用。在 Java 中,引用变量只能指向对象,而不能指向标量值。
This simple swap operation is not possible in Java, because it has neither pointers nor C++’s kind of references. In Java, a reference variable can point to only an object, not a scalar value.
传递值结果的语义与传递引用的语义相同,除非涉及别名。Ada 对输入输出模式标量参数使用传递值结果。要探索传递值结果,请考虑以下函数,我们swap3假设它使用传递值结果参数。它的语法与 Ada 的语法类似。
The semantics of pass-by-value-result is identical to those of pass-by-reference, except when aliasing is involved. Ada uses pass-by-value-result for inout-mode scalar parameters. To explore pass-by-value-result, consider the following function, swap3, which we assume uses pass-by-value-result parameters. It is written in a syntax similar to that of Ada.
procedure swap3(a : in out Integer, b : in out Integer) is
temp : Integer;
begin
temp := a;
a := b;
b := temp;
end swap3procedure swap3(a : in out Integer, b : in out Integer) is
temp : Integer;
begin
temp := a;
a := b;
b := temp;
end swap3
假设swap3被调用
Suppose swap3 is called with
swap3(c, d);swap3(c, d);
swap3此调用的操作是
The actions of swap3 with this call are
addr_c = &c - Move first parameter address in
addr_d = &d - Move second parameter address in
a = *addr_c - Move first parameter value in
b = *addr_d - Move second parameter value in
temp = a
a = b
b = temp
*addr_c = a - Move first parameter value out
*addr_d = b - Move second parameter value out
addr_c = &c - Move first parameter address in
addr_d = &d - Move second parameter address in
a = *addr_c - Move first parameter value in
b = *addr_d - Move second parameter value in
temp = a
a = b
b = temp
*addr_c = a - Move first parameter value out
*addr_d = b - Move second parameter value out
因此,这个 swap 子程序再次正确运行。接下来,考虑调用
So once again, this swap subprogram operates correctly. Next, consider the call
swap3(i, list[i]);swap3(i, list[i]);
在这种情况下,操作是
In this case, the actions are
addr_i = &i - Move first parameter address in
addr_listi= &list[i] - Move second parameter address in
a = *addr_i - Move first parameter value in
b = *addr_listi - Move second parameter value in
temp = a
a = b
b = temp
*addr_i = a - Move first parameter value out
*addr_listi = b - Move second parameter value out
addr_i = &i - Move first parameter address in
addr_listi= &list[i] - Move second parameter address in
a = *addr_i - Move first parameter value in
b = *addr_listi - Move second parameter value in
temp = a
a = b
b = temp
*addr_i = a - Move first parameter value out
*addr_listi = b - Move second parameter value out
同样,子程序运行正确,因为返回参数值的地址是在调用时而不是返回时计算的。如果实际参数的地址是在返回时计算的,结果将是错误的。
Again, the subprogram operates correctly, in this case because the addresses to which to return the values of the parameters are computed at the time of the call rather than at the time of the return. If the addresses of the actual parameters were computed at the time of the return, the results would be wrong.
最后,我们必须探讨当别名涉及传递值结果和传递引用时会发生什么。考虑以下用类似 C 的语法编写的骨架程序:
Finally, we must explore what happens when aliasing is involved with pass-by-value-result and pass-by-reference. Consider the following skeletal program written in C-like syntax:
int i = 3; /* i is a global variable */
void fun(int a, int b) {
i = b;
}
void main() {
int list[10];
list[i] = 5;
fun(i, list[i]);
}
int i = 3; /* i is a global variable */
void fun(int a, int b) {
i = b;
}
void main() {
int list[10];
list[i] = 5;
fun(i, list[i]);
}
在 中fun,如果使用按引用传递,i和a是别名。如果使用按值结果传递,i和a不是别名。假设使用按值结果传递, 的操作fun如下:
In fun, if pass-by-reference is used, i and a are aliases. If pass-by-value-result is used, i and a are not aliases. The actions of fun, assuming pass-by-value-result, are as follows:
addr_i = &i - Move first parameter address in
addr_listi = &list[i] - Move second parameter address in
a = *addr_i - Move first parameter value in
b = *addr_listi - Move second parameter value in
i = b - Sets i to 5
*addr_i = a - Move first parameter value out
*addr_listi = b - Move second parameter value out
addr_i = &i - Move first parameter address in
addr_listi = &list[i] - Move second parameter address in
a = *addr_i - Move first parameter value in
b = *addr_listi - Move second parameter value in
i = b - Sets i to 5
*addr_i = a - Move first parameter value out
*addr_listi = b - Move second parameter value out
i在这种情况下,对 中的全局变量的赋值fun将其值从 改为3,5但第一个形式参数的复制回(示例中的倒数第二行)将其重新设置为3。这里重要的观察是,如果使用传递引用,结果是复制回不是语义的一部分,并且i仍然是5。还要注意,因为第二个参数的地址是在 的开头计算的fun,所以对全局变量的任何更改i都不会影响最后用于返回 值的地址list[i]。
In this case, the assignment to the global i in fun changes its value from 3 to 5, but the copy back of the first formal parameter (the second to last line in the example) sets it back to 3. The important observation here is that if pass-by-reference is used, the result is that the copy back is not part of the semantics, and i remains 5. Also note that because the address of the second parameter is computed at the beginning of fun, any change to the global i has no effect on the address used at the end to return the value of list[i].
在编程中,许多情况如果能将子程序名作为参数发送给其他子程序,处理起来就会非常方便。一个常见的例子是,子程序必须对某个数学函数进行采样。例如,进行数值积分的子程序通过在多个不同点对函数进行采样来估计函数图下的面积。编写这样的子程序时,它应该可用于任何给定函数;不必为每个必须积分的函数重写它。因此,将用于计算要积分的数学函数的程序函数名作为参数发送给积分子程序是很自然的。
In programming, a number of situations occur that are most conveniently handled if subprogram names can be sent as parameters to other subprograms. One common example of these occurs when a subprogram must sample some mathematical function. For example, a subprogram that does numerical integration estimates the area under the graph of a function by sampling the function at a number of different points. When such a subprogram is written, it should be usable for any given function; it should not need to be rewritten for every function that must be integrated. It is therefore natural that the name of a program function that evaluates the mathematical function to be integrated be sent to the integrating subprogram as a parameter.
虽然这个想法很自然,而且看似简单,但其工作原理的细节可能会令人困惑。如果只需要传输子程序代码,则可以通过传递单个指针来完成。然而,出现了两个复杂情况。
Although the idea is natural and seemingly simple, the details of how it works can be confusing. If only the transmission of the subprogram code was necessary, it could be done by passing a single pointer. However, two complications arise.
首先,需要对作为参数传递的子程序激活函数的参数进行类型检查。在 C 和 C++ 中,函数不能作为参数传递,但指向函数的指针可以。指向函数的指针的类型包括函数的协议。由于协议包括所有参数类型,因此可以对此类参数进行完全类型检查。
First, there is the matter of type checking the parameters of the activations of the subprogram that was passed as a parameter. In C and C++, functions cannot be passed as parameters, but pointers to functions can. The type of a pointer to a function includes the function’s protocol. Because the protocol includes all parameter types, such parameters can be completely type checked.
参数为子程序的第二个复杂问题仅出现在允许嵌套子程序的语言中。问题是应该使用什么引用环境来执行传递的子程序。有三种选择:
The second complication with parameters that are subprograms appears only with languages that allow nested subprograms. The issue is what referencing environment for executing the passed subprogram should be used. There are three choices:
执行传递的子程序的调用语句的环境(浅绑定)
The environment of the call statement that enacts the passed subprogram (shallow binding)
传递的子程序定义的环境(深度绑定)
The environment of the definition of the passed subprogram (deep binding)
将子程序作为实际参数传递的调用语句的环境(临时绑定)
The environment of the call statement that passed the subprogram as an actual parameter (ad hoc binding)
以下用 JavaScript 语法编写的示例程序说明了这些选择:
The following example program, written with the syntax of JavaScript, illustrates these choices:
function sub1() {
var x;
function sub2() {
alert(x); // Creates a dialog box with the value of x
};
function sub3() {
var x;
x = 3;
sub4(sub2);
};
function sub4(subx) {
var x;
x = 4;
subx();
};
x = 1;
sub3();
};
function sub1() {
var x;
function sub2() {
alert(x); // Creates a dialog box with the value of x
};
function sub3() {
var x;
x = 3;
sub4(sub2);
};
function sub4(subx) {
var x;
x = 4;
subx();
};
x = 1;
sub3();
};
sub2考虑在 中调用 时的执行sub4。对于浅绑定,该执行的引用环境是,因此对中sub4的引用绑定到中的本地,程序的输出为。对于深绑定,的执行的引用环境是,因此对中的引用绑定到中的本地,输出为。对于临时绑定,绑定到中的本地,输出为。xsub2xsub44sub2sub1xsub2xsub11xsub33
Consider the execution of sub2 when it is called in sub4. For shallow binding, the referencing environment of that execution is that of sub4, so the reference to x in sub2 is bound to the local x in sub4, and the output of the program is 4. For deep binding, the referencing environment of sub2’s execution is that of sub1, so the reference to x in sub2 is bound to the local x in sub1, and the output is 1. For ad hoc binding, the binding is to the local x in sub3, and the output is 3.
在某些情况下,声明子程序的子程序也会将该子程序作为参数传递。在这些情况下,深度绑定和临时绑定是相同的。临时绑定从未被使用过,因为人们可能会猜测,过程作为参数出现的环境与传递的子程序没有自然联系。
In some cases, the subprogram that declares a subprogram also passes that subprogram as a parameter. In those cases, deep binding and ad hoc binding are the same. Ad hoc binding has never been used because, one might surmise, the environment in which the procedure appears as a parameter has no natural connection to the passed subprogram.
Pascal 的原始定义(Jensen and Wirth,1974)允许将子程序作为参数传递,而不包含其参数类型信息。如果可以进行独立编译(在原始 Pascal 中这是不可能的),编译器甚至不允许检查参数数量是否正确。在没有独立编译的情况下,检查参数一致性是可能的,但这是一项非常复杂的任务,通常不会这样做。
The original definition of Pascal (Jensen and Wirth, 1974) allowed subprograms to be passed as parameters without including their parameter type information. If independent compilation is possible (it was not possible in the original Pascal), the compiler is not even allowed to check for the correct number of parameters. In the absence of independent compilation, checking for parameter consistency is possible but is a very complex task, and it usually is not done.
浅绑定不适用于具有嵌套子程序的静态作用域语言。例如,假设过程Sender将过程Sent作为参数传递给过程Receiver。问题是Receiver可能不在的静态环境中,因此访问的变量Sent非常不自然。另一方面,在这种语言中,任何子程序(包括作为参数发送的子程序)的引用环境由其定义的词汇位置决定是完全正常的。因此,这些语言使用深度绑定更合乎逻辑。一些动态作用域语言使用浅绑定。SentReceiver
Shallow binding is not appropriate for static-scoped languages with nested subprograms. For example, suppose the procedure Sender passes the procedure Sent as a parameter to the procedure Receiver. The problem is that Receiver may not be in the static environment of Sent, thereby making it highly unnatural for Sent to have access to Receiver’s variables. On the other hand, it is perfectly normal in such a language for any subprogram, including one sent as a parameter, to have its referencing environment determined by the lexical position of its definition. It is therefore more logical for these languages to use deep binding. Some dynamic-scoped languages use shallow binding.
在某些情况下,必须间接调用子程序。这种情况最常发生在直到运行时才知道要调用的具体子程序。对子程序的调用是通过指向子程序的指针或引用进行的,该指针或引用在调用之前已在执行过程中设置。间接子程序调用的两种最常见应用是图形用户界面中的事件处理(现在几乎所有 Web 应用程序以及许多非 Web 应用程序都包含该事件处理)和回调(在回调中,调用子程序并指示其在被调用子程序完成其工作时通知调用者)。与往常一样,我们的兴趣不在于这些特定类型的编程,而是编程语言对它们的支持。
There are situations in which subprograms must be called indirectly. These most often occur when the specific subprogram to be called is not known until run time. The call to the subprogram is made through a pointer or reference to the subprogram, which has been set during execution before the call is made. The two most common applications of indirect subprogram calls are for event handling in graphical user interfaces, which are now part of nearly all Web applications, as well as many non-Web applications, and for callbacks, in which a subprogram is called and instructed to notify the caller when the called subprogram has completed its work. As always, our interest is not in these specific kinds of programming, but rather in programming language support for them.
间接调用子程序的概念并不是最近才出现的概念。C 和 C++ 允许程序定义指向函数的指针,通过该指针可以调用该函数。在 C++ 中,指向函数的指针根据函数的返回类型和参数类型进行类型化,因此这种指针只能指向具有一种特定协议的函数。例如,以下声明定义了一个指针 ( pfun),它可以指向任何以 afloat和 anint作为参数并返回 a 的函数float:
The concept of calling subprograms indirectly is not a recently developed concept. C and C++ allow a program to define a pointer to a function, through which the function can be called. In C++, pointers to functions are typed according to the return type and parameter types of the function, so that such a pointer can point only at functions with one particular protocol. For example, the following declaration defines a pointer (pfun) that can point to any function that takes a float and an int as parameters and returns a float:
float (*pfun)(float, int);float (*pfun)(float, int);
任何与此指针具有相同协议的函数都可以用作此指针的初始值,也可以在程序中将其赋值给指针。在 C 和 C++ 中,没有后跟括号的函数名称(就像没有后跟方括号的数组名称)是函数(或数组)的地址。因此,以下两种方式都是为指向函数的指针赋予初始值或赋值的合法方式:
Any function with the same protocol as this pointer can be used as the initial value of this pointer or be assigned to the pointer in a program. In C and C++, a function name without following parentheses, like an array name without following brackets, is the address of the function (or array). So, both of the following are legal ways of giving an initial value or assigning a value to a pointer to a function:
int myfun2 (int, int); // A function declaration
int (*pfun2)(int, int) = myfun2; // Create a pointer and
// initialize
// it to point to myfun2
pfun2 = myfun2; // Assigning a function's address to a
// pointer
int myfun2 (int, int); // A function declaration
int (*pfun2)(int, int) = myfun2; // Create a pointer and
// initialize
// it to point to myfun2
pfun2 = myfun2; // Assigning a function's address to a
// pointer
myfun2现在可以使用以下任一语句调用该函数:
The function myfun2 can now be called with either of the following statements:
(*pfun2)(first, second);pfun2(first, second);
(*pfun2)(first, second);pfun2(first, second);
其中第一个明确取消引用指针pfun2,这是合法的,但没有必要。
The first of these explicitly dereferences the pointer pfun2, which is legal, but unnecessary.
C 和 C++ 的函数指针可以作为参数发送并从函数返回,尽管函数不能直接用于上述任一角色。
The function pointers of C and C++ can be sent as parameters and returned from functions, although functions cannot be used directly in either of those roles.
在 C# 中,通过将方法指针变为对象,可以增强方法指针的功能和灵活性。这些指针被称为委托,因为程序不是调用方法,而是将该操作委托给委托。
In C#, the power and flexibility of method pointers is increased by making them objects. These are called delegates, because instead of calling a method, a program delegates that action to a delegate.
要使用委托,首先必须使用特定的方法协议定义委托类。委托的实例化包含可以调用的委托协议的方法名称。委托声明的语法与方法声明的语法相同,只是在delegate返回类型之前插入了保留字。例如,我们可以有以下内容:
To use a delegate, first the delegate class must be defined with a specific method protocol. An instantiation of a delegate holds the name of a method with the delegate’s protocol that it is able to call. The syntax of a declaration of a delegate is the same as that of a method declaration, except that the reserved word delegate is inserted just before the return type. For example, we could have the following:
public delegate int Change(int x);public delegate int Change(int x);
int此委托可以用任何以 为参数并返回 的方法实例化int。例如,考虑以下方法声明:
This delegate can be instantiated with any method that takes an int as a parameter and returns an int. For example, consider the following method declaration:
static int fun1(int x);static int fun1(int x);
可以通过将此方法的名称发送到委托的构造函数来实例化委托Change,如下所示:
The delegate Change can be instantiated by sending the name of this method to the delegate’s constructor, as in the following:
Change chgfun1 = new Change(fun1);Change chgfun1 = new Change(fun1);
这可以缩短为以下内容:
This can be shortened to the following:
Change chgfun1 = fun1;Change chgfun1 = fun1;
fun1以下是通过委托进行调用的示例chgfun1:
Following is an example call to fun1 through the delegate chgfun1:
chgfun1(12);chgfun1(12);
委托类的对象可以存储多个方法。可以使用运算符添加第二个方法+=,如下所示:
Objects of a delegate class can store more than one method. A second method can be added using the operator +=, as in the following:
Change chgfun1 += fun2;Change chgfun1 += fun2;
这会将 放置fun2在chgfun1委托中,即使chgfun1先前具有值null。委托实例中存储的所有方法都按照它们在实例中放置的顺序进行调用。这称为多播委托。无论方法返回什么,只返回最后调用的方法返回的值或对象。当然,这意味着在大多数情况下,void由通过多播委托调用的方法返回。
This places fun2 in the chgfun1 delegate, even if chgfun1 previously had the value null. All of the methods stored in a delegate instance are called in the order in which they were placed in the instance. This is called a multicast delegate. Regardless of what is returned by the methods, only the value or object returned by the last one called is returned. Of course, this means that in most cases, void is returned by the methods called through a multicast delegate.
在我们的示例中,静态方法被放置在委托中Change。实例方法也可以通过委托来调用,在这种情况下,委托必须存储对方法的引用。委托也可以是泛型的。
In our example, a static method is placed in the delegate Change. Instance methods can also be called through a delegate, in which case the delegate must store a reference to the method. Delegates can also be generic.
委托用于 .NET 应用程序的事件处理。它们还用于实现闭包(参见第9.12节 )。
Delegates are used for event handling by .NET applications. They are also used to implement closures (see Section 9.12).
与 C 和 C++ 的情况一样,Python 中不带括号的函数名称是指向该函数的指针。Ada 95 有指向子程序的指针,但 Java 没有。在 Python 和 Ruby 以及大多数函数式语言中,子程序被视为数据,因此可以将它们分配给变量。因此,在这些语言中,几乎不需要指向子程序的指针。
As is the case with C and C++, the name of a function in Python without the following parentheses is a pointer to that function. Ada 95 has pointers to subprograms, but Java does not. In Python and Ruby, as well as most functional languages, subprograms are treated like data, so they can be assigned to variables. Therefore, in these languages, there is little need for pointers to subprograms.
以下设计问题特定于功能:
The following design issues are specific to functions:
允许有副作用吗?
Are side effects allowed?
可以返回哪些类型的值?
What types of values can be returned?
可以返回多少个值?
How many values can be returned?
由于在表达式中调用的函数存在副作用问题(如第5章 所述),函数的参数应始终处于模式内。事实上,有些语言要求这样做;例如,Ada 函数只能具有模式内形式参数。此要求有效地防止函数通过其参数或通过参数和全局变量的别名引起副作用。然而,在大多数其他命令式语言中,函数可以具有按值传递或按引用传递的参数,从而允许引起副作用和别名的函数。
Because of the problems of side effects of functions that are called in expressions, as described in Chapter 5, parameters to functions should always be in-mode. In fact, some languages require this; for example, Ada functions can have only in-mode formal parameters. This requirement effectively prevents a function from causing side effects through its parameters or through aliasing of parameters and globals. In most other imperative languages, however, functions can have either pass-by-value or pass-by-reference parameters, thus allowing functions that cause side effects and aliasing.
纯函数式语言,例如 Haskell 和 Ruby,没有变量,因此它们的函数不会产生副作用。
Pure functional languages, such as Haskell and Ruby, do not have variables, so their functions cannot have side effects.
大多数命令式编程语言都会限制其函数可以返回的类型。C 允许其函数返回除数组和函数之外的任何类型。这两者都可以通过指针类型返回值来处理。C++ 与 C 类似,但也允许从其函数返回用户定义的类型或类。在当前的命令式语言中,只有 Ada、Python 和 Ruby 的函数(和/或方法)可以返回任何类型的值。但是,对于 Ada,由于函数不是 Ada 中的类型,因此它们不能从函数返回。当然,函数可以返回指向函数的指针。
Most imperative programming languages restrict the types that can be returned by their functions. C allows any type to be returned by its functions except arrays and functions. Both of these can be handled by pointer type return values. C++ is like C but also allows user-defined types, or classes, to be returned from its functions. Ada, Python, and Ruby are the only languages among current imperative languages whose functions (and/or methods) can return values of any type. In the case of Ada, however, because functions are not types in Ada, they cannot be returned from functions. Of course, pointers to functions can be returned by functions.
在某些编程语言中,子程序是一等对象,这意味着它们可以作为参数传递、从函数返回以及分配给变量。在某些命令式语言(例如 Python 和 Ruby)中,方法是一等对象。大多数函数式语言中的函数也是如此。
In some programming languages, subprograms are first-class objects, which means that they can be passed as parameters, returned from functions, and assigned to variables. Methods are first-class objects in some imperative languages, for example, Python and Ruby. The same is true for the functions in most functional languages.
Java 和 C# 都不能有函数,尽管它们的方法类似于函数。在两者中,任何类型或类都可以通过方法返回。因为方法不是类型,所以它们不能被返回。
Neither Java nor C# can have functions, although their methods are similar to functions. In both, any type or class can be returned by methods. Because methods are not types, they cannot be returned.
在大多数语言中,函数只能返回一个值。然而,情况并非总是如此。Ruby 允许从方法返回多个值。如果returnRuby 方法中的语句后面没有表达式,则nil返回。如果后面跟着一个表达式,则返回表达式的值。如果后面跟着多个表达式,则返回所有表达式的值的数组。
In most languages, only a single value can be returned from a function. However, that is not always the case. Ruby allows the return of more than one value from a method. If a return statement in a Ruby method is not followed by an expression, nil is returned. If followed by one expression, the value of the expression is returned. If followed by more than one expression, an array of the values of all of the expressions is returned.
在 ML、F# 和 Python 以及一些其他包含元组的语言中,可以通过将多个值放在元组中来返回它们。
In ML, F#, and Python, and some other languages that include tuples, multiple values can be returned by placing them in a tuple.
重载运算符具有多重含义。重载运算符的特定实例的含义由其操作数的类型决定。例如,如果*Java 程序中的运算符有两个浮点操作数,则它指定浮点乘法。但如果同一运算符有两个整数操作数,则它指定整数乘法。
An overloaded operator is one that has multiple meanings. The meaning of a particular instance of an overloaded operator is determined by the types of its operands. For example, if the * operator has two floating-point operands in a Java program, it specifies floating-point multiplication. But if the same operator has two integer operands, it specifies integer multiplication.
重载子程序是与同一引用环境中的另一个子程序同名的子程序。每个版本的重载子程序都必须具有唯一的协议;也就是说,它的参数数量、顺序或类型必须与其他版本不同,并且可能在返回类型上也不同。对重载子程序的调用的含义由实际参数列表和/或可能返回值的类型决定。虽然这不是必需的,但重载子程序通常实现相同的过程。
An overloaded subprogram is a subprogram that has the same name as another subprogram in the same referencing environment. Every version of an overloaded subprogram must have a unique protocol; that is, it must be different from the others in the number, order, or types of its parameters, and possibly in its return type. The meaning of a call to an overloaded subprogram is determined by the actual parameter list and/or possibly the type of the returned value. Although it is not necessary, overloaded subprograms usually implement the same process.
C++、Java 和 C# 包含预定义的重载子程序。例如,C++、Java 和 C# 中的许多类都有重载构造函数。由于每个版本的重载子程序都有唯一的参数配置文件,因此编译器可以通过不同的类型参数消除对它们的调用的歧义。不幸的是,事情并没有那么简单。参数强制转换(如果允许)会使消除歧义过程变得非常复杂。简单地说,问题是,如果没有方法的参数配置文件与方法调用中实际参数的数量和类型相匹配,但两个或多个方法的参数配置文件可以通过强制转换进行匹配,那么应该调用哪个方法?语言设计者要回答这个问题,必须决定如何对所有不同的强制转换进行排序,以便编译器可以选择与调用“最”匹配的方法。这可能是一项复杂的任务。要了解这个过程的复杂程度,我们建议读者参考 C++ 中使用的方法调用歧义消除规则(Stroustrup,1997)。
C++, Java, and C# include predefined overloaded subprograms. For example, many classes in C++, Java, and C# have overloaded constructors. Because each version of an overloaded subprogram has a unique parameter profile, the compiler can disambiguate occurrences of calls to them by the different type parameters. Unfortunately, it is not that simple. Parameter coercions, when allowed, complicate the disambiguation process enormously. Simply stated, the issue is that if no method’s parameter profile matches the number and types of the actual parameters in a method call, but two or more methods have parameter profiles that can be matched through coercions, which method should be called? For a language designer to answer this question, he or she must decide how to rank all of the different coercions, so that the compiler can choose the method that “best” matches the call. This can be a complicated task. To understand the level of complexity of this process, we suggest the reader refer to the rules for disambiguation of method calls used in C++ (Stroustrup, 1997).
由于 C++、Java 和 C# 允许混合模式表达式,因此返回类型与重载函数(或方法)的歧义消除无关。调用上下文不允许确定返回类型。例如,如果 C++ 程序有两个函数,fun并且都接受一个int参数,但一个返回int,另一个返回,则该程序将无法编译,因为编译器无法确定应使用float哪个版本的。fun
Because C++, Java, and C# allow mixed-mode expressions, the return type is irrelevant to disambiguation of overloaded functions (or methods). The context of the call does not allow the determination of the return type. For example, if a C++ program has two functions named fun and both take an int parameter but one returns an int and one returns a float, the program would not compile, because the compiler could not determine which version of fun should be used.
用户还可以在 Java、C++、C# 和 F# 中编写具有相同名称的多个版本的子程序。同样,在 C++、Java 和 C# 中,最常见的用户定义重载方法是构造函数。
Users are also allowed to write multiple versions of subprograms with the same name in Java, C++, C#, and F#. Once again, in C++, Java, and C# the most common user-defined overloaded methods are constructors.
具有默认参数的重载子程序可能会导致子程序调用不明确。例如,请考虑以下 C++ 代码:
Overloaded subprograms that have default parameters can lead to ambiguous subprogram calls. For example, consider the following C++ code:
void fun(float b = 0.0);
void fun();
. . .
fun();
void fun(float b = 0.0);
void fun();
. . .
fun();
该调用有歧义,会导致编译错误。
The call is ambiguous and will cause a compilation error.
软件重用是软件生产力的重要贡献因素。提高软件可重用性的一种方法是减少创建针对不同类型数据实现相同算法的不同子程序的需要。例如,程序员不需要编写四个不同的排序子程序来对四个仅在元素类型上不同的数组进行排序。
Software reuse can be an important contributor to software productivity. One way to increase the reusability of software is to lessen the need to create different subprograms that implement the same algorithm on different types of data. For example, a programmer should not need to write four different sort subprograms to sort four arrays that differ only in element type.
多态子程序在不同激活时采用不同类型的参数。重载子程序提供一种称为临时 多态性的特殊多态性。重载子程序不需要有类似的行为。
A polymorphic subprogram takes parameters of different types on different activations. Overloaded subprograms provide a particular kind of polymorphism called ad hoc polymorphism. Overloaded subprograms need not behave similarly.
支持面向对象编程的语言通常支持子类型多态性。子类型多态性意味着类型 T 的变量可以访问类型 T 的任何对象或从 T 派生的任何类型的对象。
Languages that support object-oriented programming usually support subtype polymorphism. Subtype polymorphism means that a variable of type T can access any object of type T or any type derived from T.
Python 和 Ruby 的方法提供了一种更通用的多态性。回想一下,这些语言中的变量没有类型,因此形式参数也没有类型。因此,只要定义了方法中形式参数上使用的运算符,方法就可以适用于任何类型的实际参数。
A more general kind of polymorphism is provided by the methods of Python and Ruby. Recall that variables in these languages do not have types, so formal parameters do not have types. Therefore, a method will work for any type of actual parameter, as long as the operators used on the formal parameters in the method are defined.
参数多态性由子程序提供,该子程序采用泛型参数,这些参数用于描述子程序参数类型的类型表达式。此类子程序的不同实例可以赋予不同的泛型参数,从而产生采用不同类型参数的子程序。子程序的参数定义都具有相同的行为。参数多态子程序通常称为泛型子程序。C++、Java C# 并且 F# 提供了一种编译时参数多态性。
Parametric polymorphism is provided by a subprogram that takes generic parameters that are used in type expressions that describe the types of the parameters of the subprogram. Different instantiations of such subprograms can be given different generic parameters, producing subprograms that take different types of parameters. Parametric definitions of subprograms all behave the same. Parametrically polymorphic subprograms are often called generic subprograms. C++, Java C# and F# provide a kind of compile-time parametric polymorphism.
C++ 中的泛型函数具有模板函数的描述性名称。模板函数的定义具有以下一般形式
Generic functions in C++ have the descriptive name of template functions. The definition of a template function has the general form
template <模板参数>
——可能包含模板参数的函数定义
template <template parameters>
—a function definition that may include the template parameters
模板参数(必须至少有一个)具有以下形式之一
A template parameter (there must be at least one) has one of the forms
class标识符typename标识符
class identifiertypename identifier
类形式用于类型名称。形式typename用于将值传递给模板函数。例如,有时在模板函数中传递一个整数值作为数组的大小会很方便。
The class form is used for type names. The typename form is used for passing a value to the template function. For example, it is sometimes convenient to pass an integer value for the size of an array in the template function.
模板可以将另一个模板(在实践中通常是定义用户定义的泛型类型的模板类)作为参数,但我们在此不考虑该选项。8
A template can take another template, in practice often a template class that defines a user-defined generic type, as a parameter, but we do not consider that option here.8
作为模板函数的示例,请考虑以下内容:
As an example of a template function, consider the following:
template <class Type>
Type max(Type first, Type second) {
return first > second ? first : second;
}
template <class Type>
Type max(Type first, Type second) {
return first > second ? first : second;
}
其中Type是指定函数将操作的数据类型的参数。此模板函数可以实例化为>定义运算符的任何类型。例如,如果将其实例化为int参数,则它将是
where Type is the parameter that specifies the type of data on which the function will operate. This template function can be instantiated for any type for which the operator > is defined. For example, if it were instantiated with int as the parameter, it would be
int max(int first, int second) {
return first > second ? first : second;
}
int max(int first, int second) {
return first > second ? first : second;
}
虽然这个过程可以定义为宏,但如果参数是具有副作用的表达式,则宏的缺点是无法正确运行。例如,假设宏定义为
Although this process could be defined as a macro, a macro would have the disadvantage of not operating correctly if the parameters were expressions with side effects. For example, suppose the macro were defined as
#define max(a, b) ((a) > (b)) ? (a) : (b)#define max(a, b) ((a) > (b)) ? (a) : (b)
此定义是通用的,因为它适用于任何数字类型。但是,如果使用具有副作用的参数调用,它并不总是正常工作,例如
This definition is generic in the sense that it works for any numeric type. However, it does not always work correctly if called with a parameter that has a side effect, such as
max(x++, y)max(x++, y)
产生
which produces
((x++) > (y) ? (x++) : (y))((x++) > (y) ? (x++) : (y))
每当 的值x大于 的值时y,x就会增加两次。
Whenever the value of x is greater than that of y, x will be incremented twice.
C++ 模板函数在函数调用中被命名或使用运算符获取其地址时被隐式实例化&。例如,示例模板函数max将通过以下代码段实例化两次 - 一次用于int类型参数,一次用于char类型参数:
C++ template functions are instantiated implicitly either when the function is named in a call or when its address is taken with the & operator. For example, the example template function max would be instantiated twice by the following code segment—once for int type parameters and once for char type parameters:
int a, b, c;
char d, e, f;
. . .
c = max(a, b);
f = max(d, e);
int a, b, c;
char d, e, f;
. . .
c = max(a, b);
f = max(d, e);
以下是 C++ 通用排序子程序:
The following is a C++ generic sort subprogram:
template <class Type>
void generic_sort(Type list[], int len) {
int top, bottom;
Type temp;
for (top = 0; top < len - 2; top++)
for (bottom = top + 1; bottom < len - 1; bottom++)
if (list[top] > list[bottom]) {
temp = list[top];
list[top] = list[bottom];
list[bottom] = temp;
} //** end of if (list[top] . . .
} //** end of generic_sort
template <class Type>
void generic_sort(Type list[], int len) {
int top, bottom;
Type temp;
for (top = 0; top < len - 2; top++)
for (bottom = top + 1; bottom < len - 1; bottom++)
if (list[top] > list[bottom]) {
temp = list[top];
list[top] = list[bottom];
list[bottom] = temp;
} //** end of if (list[top] . . .
} //** end of generic_sort
以下是该模板函数的实例:
The following is an example instantiation of this template function:
float flt_list[100];
. . .
generic_sort(flt_list, 100);
float flt_list[100];
. . .
generic_sort(flt_list, 100);
C++ 的模板函数与子程序有点类似,在子程序中,形式参数的类型会动态绑定到调用中的实际参数的类型。在这种情况下,只需要一份代码副本,而使用 C++ 方法时,必须在编译时为所需的每种不同类型创建一个副本,并且子程序调用与子程序的绑定是静态的。
The templated functions of C++ are a kind of poor cousin to a subprogram in which the types of the formal parameters are dynamically bound to the types of the actual parameters in a call. In this case, only a single copy of the code is needed, whereas with the C++ approach, a copy must be created at compile time for each different type that is required and the binding of subprogram calls to subprograms is static.
Java 5.0 中增加了对泛型类型和方法的支持。Java 5.0 中的泛型类的名称由名称后跟一个或多个用尖括号分隔的类型变量指定。例如,
Support for generic types and methods was added to Java in Java 5.0. The name of a generic class in Java 5.0 is specified by a name followed by one or more type variables delimited by pointed brackets. For example,
generic_class<T>generic_class<T>
where T is the type variable. Generic types are discussed in more detail in Chapter 11.
Java 的泛型方法在几个重要方面与 C++ 的泛型子程序不同。首先,泛型参数必须是类 — — 它们不能是原始类型。此要求不允许使用模仿我们在 C++ 中的示例的泛型方法,其中数组的组件类型是泛型,并且可以是原始类型。在 Java 中,数组(而不是容器)的组件不能是泛型。其次,尽管 Java 泛型方法可以实例化任意次数,但只构建一份代码副本。泛型方法的内部版本称为原始方法,它对类对象进行操作Object。在返回泛型方法的泛型值时,编译器会插入到正确类型的强制转换。第三,在 Java 中,可以对可以作为泛型参数传递给泛型方法的类的范围指定限制。这种限制称为界限。
Java’s generic methods differ from the generic subprograms of C++ in several important ways. First, generic parameters must be classes—they cannot be primitive types. This requirement disallows a generic method that mimics our example in C++, in which the component types of arrays are generic and can be primitives. In Java, the components of arrays (as opposed to containers) cannot be generic. Second, although Java generic methods can be instantiated any number of times, only one copy of the code is built. The internal version of a generic method, which is called a raw method, operates on Object class objects. At the point where the generic value of a generic method is returned, the compiler inserts a cast to the proper type. Third, in Java, restrictions can be specified on the range of classes that can be passed to the generic method as generic parameters. Such restrictions are called bounds.
作为通用 Java 5.0 方法的示例,请考虑以下骨架方法定义:
As an example of a generic Java 5.0 method, consider the following skeletal method definition:
public static <T> T doIt(T[] list) {
. . .
}
public static <T> T doIt(T[] list) {
. . .
}
这定义了一个名为 的方法doIt,该方法采用泛型类型的元素数组。泛型类型的名称为T,并且它必须是数组。以下是示例调用doIt:
This defines a method named doIt that takes an array of elements of a generic type. The name of the generic type is T and it must be an array. Following is an example call to doIt:
doIt<string>(myList);doIt<string>(myList);
现在,考虑以下版本的doIt,其泛型参数有一个界限:
Now, consider the following version of doIt, which has a bound on its generic parameter:
public static <T extends Comparable> T doIt(T[] list) {
. . .
}
public static <T extends Comparable> T doIt(T[] list) {
. . .
}
这定义了一个方法,该方法采用一个泛型数组参数,该参数的元素属于实现接口的类Comparable。这是对泛型参数的限制或界限。保留字extends似乎暗示泛型类是以下类的子类。然而,在这种情况下,extends具有不同的含义。表达式指定应该是边界类型的“子类型”。因此,在这种情况下,意味着泛型类(或接口)要么扩展边界类(如果是类,则为边界),要么实现边界接口(如果边界是接口)。边界确保泛型的任何实例化的元素都可以与方法进行比较。<T extends BoundingType>TextendsComparablecompareTo
This defines a method that takes a generic array parameter whose elements are of a class that implements the Comparable interface. That is the restriction, or bound, on the generic parameter. The reserved word extends seems to imply that the generic class subclasses the following class. In this context, however, extends has a different meaning. The expression <T extends BoundingType> specifies that T should be a “subtype” of the bounding type. So, in this context, extends means the generic class (or interface) either extends the bounding class (the bound if it is a class) or implements the bounding interface (if the bound is an interface). The bound ensures that the elements of any instantiation of the generic can be compared with the Comparable method, compareTo.
如果泛型方法对其泛型类型有两个或多个限制,则将它们添加到子句中extends,并用“与”符号 ( ) 分隔&。此外,泛型方法可以有多个泛型参数。
If a generic method has two or more restrictions on its generic type, they are added to the extends clause, separated by ampersands (&). Also, generic methods can have more than one generic parameter.
Java 5.0 支持通配符类型。例如,Collection<?>是集合类的通配符类型。此类型可用于任何类组件的任何集合类型。例如,考虑以下泛型方法:
Java 5.0 supports wildcard types. For example, Collection<?> is a wildcard type for collection classes. This type can be used for any collection type of any class components. For example, consider the following generic method:
void printCollection(Collection<?> c) {
for (Object e: c) {
System.out.println(e);
}
}
void printCollection(Collection<?> c) {
for (Object e: c) {
System.out.println(e);
}
}
此方法打印任何类的元素Collection,无论其组件属于哪个类。对于通配符类型的对象,必须小心谨慎。例如,由于此类型的特定对象的组件具有类型,因此无法将其他类型的对象添加到集合中。例如,考虑
This method prints the elements of any Collection class, regardless of the class of its components. Some care must be taken with objects of the wildcard type. For example, because the components of a particular object of this type have a type, other type objects cannot be added to the collection. For example, consider
Collection<?> c = new ArrayList<String>();Collection<?> c = new ArrayList<String>();
add使用方法来把某些东西放入这个集合是非法的,除非它的类型是String。
It would be illegal to use the add method to put something into this collection unless its type were String.
通配符类型可以受到限制,就像非通配符类型一样。此类类型称为有界通配符类型。例如,考虑以下方法标头:
Wildcard types can be restricted, as is the case with nonwildcard types. Such types are called bounded wildcard types. For example, consider the following method header:
public void drawAll(ArrayList<? extends Shape> things)public void drawAll(ArrayList<? extends Shape> things)
此处的泛型类型是通配符类型,是 类的子类Shape。可以编写此方法来绘制类型为 子类的任何对象Shape。
The generic type here is a wildcard type that is a subclass of the Shape class. This method could be written to draw any object whose type is a subclass of Shape.
C# 2005 的泛型方法在功能上与 Java 5.0 的泛型方法类似,只是不支持通配符类型。C# 2005 泛型方法的一个独特功能是,如果编译器可以推断出未指定的类型,则可以省略调用中的实际类型参数。例如,考虑以下骨架类定义:
The generic methods of C# 2005 are similar in capability to those of Java 5.0, except there is no support for wildcard types. One unique feature of C# 2005 generic methods is that the actual type parameters in a call can be omitted if the compiler can infer the unspecified type. For example, consider the following skeletal class definition:
class MyClass {
public static T DoIt<T>(T p1) {
. . .
}
}
class MyClass {
public static T DoIt<T>(T p1) {
. . .
}
}
如果编译器可以根据调用中的实际参数推断出泛型类型,则可以在不指定泛型参数的情况下调用该方法DoIt。例如,以下两个调用都是合法的:
The method DoIt can be called without specifying the generic parameter if the compiler can infer the generic type from the actual parameter in the call. For example, both of the following calls are legal:
int myInt = MyClass.DoIt(17); // Calls DoIt<int>
string myStr = MyClass.DoIt('apples');
// Calls DoIt<string>
int myInt = MyClass.DoIt(17); // Calls DoIt<int>
string myStr = MyClass.DoIt('apples');
// Calls DoIt<string>F# 的类型推断系统并不总是能够确定函数的参数类型或返回类型。在这种情况下,对于某些函数,F# 会推断出参数和返回值的泛型类型。这称为自动泛化。例如,考虑以下函数定义:
The type inferencing system of F# is not always able to determine the type of parameters or the return type of a function. When this is the case, for some functions F# infers a generic type for the parameters and the return value. This is called automatic generalization. For example, consider the following function definition:
let getLast (a, b, c) = c;;let getLast (a, b, c) = c;;
由于没有包含类型信息,因此参数和返回值的类型均被推断为泛型。由于此函数不包含任何计算,因此这是一个简单的泛型函数。
Because no type information was included, the types of the parameters and the return value are all inferred to be generic. Because this function does not include any computations, this is a simple generic function.
函数可以定义为具有通用参数,如下例所示:
Functions can be defined to have generic parameters, as in the following example:
let printPair (x: 'a) (y: 'a) =
printfn "%A %A" x y;;
let printPair (x: 'a) (y: 'a) =
printfn "%A %A" x y;;
格式%A规范适用于任何类型。 命名类型前面的撇号a指定它是泛型类型。9此函数定义有效(带有泛型参数),因为不包含类型约束运算。 算术运算符是类型约束运算的示例。 例如,考虑以下函数定义:
The %A format specification is for any type. The apostrophe in front of the type named a specifies it to be a generic type.9 This function definition works (with generic parameters) because no type-constrained operation is included. Arithmetic operators are examples of type-constrained operations. For example, consider the following function definition:
let adder x y = x + y;;let adder x y = x + y;;
类型推断将 的类型设置为x,y并将返回值设置为int。由于 F# 中没有类型强制,因此以下调用是非法的:
Type inferencing sets the type of x and y and the return value to int. Because there is no type coercion in F#, the following call is illegal:
adder 2.5 3.6;;adder 2.5 3.6;;
即使将参数的类型设置为通用的,该运算符也会导致和+的类型为。xyint
Even if the type of the parameters were set to be generic, the + operator would cause the types of x and y to be int.
泛型类型也可以在尖括号中明确指定,如下所示:
The generic type could also be specified explicitly in angle brackets, as in the following:
let printPair2<'T> x y =
printfn "%A %A" x y;;
let printPair2<'T> x y =
printfn "%A %A" x y;;
必须使用类型10来调用此函数,如下所示:
This function must be called with a type,10 as in the following:
printPair2<float> 3.5 2.4;;printPair2<float> 3.5 2.4;;
由于类型推断和缺乏类型强制,F# 泛型函数的用处远不如 C++、Java 中的泛型函数,尤其是对于数值计算而言。 和 C#
Because of type inferencing and the lack of type coercions, F# generic functions are far less useful, especially for numeric computations, than those of C+ +, Java , and C#
用户可以在 Ada、C++、Python 和 Ruby 中重载运算符。假设开发了一个 Python 类来支持复数及其算术运算。复数可以用两个浮点值表示。该类Complex将具有这两个成员,名为real和imag。在 Python 中,二进制算术运算实现为发送到第一个操作数的方法调用,将第二个操作数作为参数发送。对于加法,该方法名为__add__。例如,表达式x + y实现为x.__add__(y)。要重载+新类对象的加法Complex,我们只需提供Complex一个名为 的方法__add__来执行该操作。以下是这种方法:
Operators can be overloaded by the user in Ada, C++, Python, and Ruby. Suppose that a Python class is developed to support complex numbers and arithmetic operations on them. A complex number can be represented with two floating-point values. The Complex class would have members for these two named real and imag. In Python, binary arithmetic operations are implemented as method calls sent to the first operand, sending the second operand as a parameter. For addition, the method is named __add__. For example, the expression x + y is implemented as x.__add__(y). To overload + for the addition of objects of the new Complex class, we only need to provide Complex with a method named __add__ that performs the operation. Following is such a method:
def __add__ (self, second):
return Complex(self.real + second.real, self.imag +
second.imag)
def __add__ (self, second):
return Complex(self.real + second.real, self.imag +
second.imag)
在大多数支持面向对象编程的语言中,每次调用方法时都会隐式发送对当前对象的引用。在 Python 中,必须显式发送此引用;这就是为什么self我们方法的第一个参数是__add__。
In most languages that support object-oriented programming, a reference to the current object is implicitly sent with each method call. In Python, this reference must be sent explicitly; that is the reason why self is the first parameter to our method, __add__.
对于 C++ 中的复杂类,示例 add 方法可以编写如下11:
The example add method could be written for a complex class in C++ as follows11 :
Complex operator +(Complex &second) {
return Complex(real + second.real, imag + second.imag);
}
Complex operator +(Complex &second) {
return Complex(real + second.real, imag + second.imag);
}
定义闭包很简单;闭包就是一个子程序和定义它的引用环境。如果可以从程序中的任意位置调用子程序,则需要引用环境。解释闭包并不那么简单。
Defining a closure is a simple matter; a closure is a subprogram and the referencing environment where it was defined. The referencing environment is needed if the subprogram can be called from any arbitrary place in the program. Explaining a closure is not so simple.
如果静态作用域编程语言不允许嵌套子程序,则闭包没有用处,因此此类语言不支持闭包。此类语言中子程序引用环境中的所有变量(其局部变量和全局变量)都是可访问的,无论子程序在程序中的哪个位置被调用。
If a static-scoped programming language does not allow nested subprograms, closures are not useful, so such languages do not support them. All of the variables in the referencing environment of a subprogram in such a language (its local variables and the global variables) are accessible, regardless of the place in the program where the subprogram is called.
当子程序可以嵌套时,除了局部变量和全局变量之外,子程序的引用环境还可以包括所有封闭子程序中定义的变量。但是,如果只能在所有封闭范围都处于活动状态且可见的位置调用子程序,则这不是问题。如果可以在其他地方调用子程序,则会出现问题。如果可以将子程序作为参数传递或分配给变量,从而允许从程序中的几乎任何地方调用它,则可能会发生这种情况。有一个相关的问题:子程序可以在其一个或多个嵌套子程序终止后调用,这通常意味着在这些嵌套子程序中定义的变量已被释放 - 它们不再存在。为了使子程序可以从程序中的任何位置调用,其引用环境必须在可能被调用的任何地方可用。因此,嵌套子程序中定义的变量可能需要整个程序的生命周期,而不仅仅是定义它们的子程序处于活动状态的时间。变量的生命周期与整个程序的生命周期相同,则称其具有无限范围。这通常意味着它们必须是堆动态的,而不是堆栈动态的。
When subprograms can be nested, in addition to locals and globals, the referencing environment of a subprogram can include variables defined in all enclosing subprograms. However, this is not an issue if the subprogram can be called only in places where all of the enclosing scopes are active and visible. It becomes an issue if a subprogram can be called elsewhere. This can happen if the subprogram can be passed as a parameter or assigned to a variable, thereby allowing it to be called from virtually anywhere in the program. There is an associated problem: The subprogram could be called after one or more of its nesting subprograms has terminated, which normally means that the variables defined in such nesting subprograms have been deallocated—they no longer exist. For the subprogram to be callable from anywhere in the program, its referencing environment must be available wherever it might be called. Therefore, the variables defined in nesting subprograms may need lifetimes that are of the entire program, rather than just the time during which the subprogram in which they were defined is active. A variable whose lifetime is that of the whole program is said to have unlimited extent. This usually means they must be heap dynamic, rather than stack dynamic.
几乎所有函数式编程语言、大多数脚本语言以及至少一种主要为命令式的语言 C# 都支持闭包。这些语言是静态作用域的,允许嵌套子程序12,并允许将子程序作为参数传递。以下是用 JavaScript 编写的闭包示例:
Nearly all functional programming languages, most scripting languages, and at least one primarily imperative language, C#, support closures. These languages are static-scoped, allow nested subprograms,12 and allow subprograms to be passed as parameters. Following is an example of a closure written in JavaScript:
function makeAdder(x) {
return function(y) {return x + y;}
}
. . .
var add10 = makeAdder(10);
var add5 = makeAdder(5);
document.write("Add 10 to 20: " + add10(20) +
"<br />");
document.write("Add 5 to 20: " + add5(20) +
"<br />");
function makeAdder(x) {
return function(y) {return x + y;}
}
. . .
var add10 = makeAdder(10);
var add5 = makeAdder(5);
document.write("Add 10 to 20: " + add10(20) +
"<br />");
document.write("Add 5 to 20: " + add5(20) +
"<br />");
假设该代码嵌入在 HTML 文档中并通过浏览器显示,则其输出如下:
The output of this code, assuming it was embedded in an HTML document and displayed with a browser, is as follows:
Add 10 to 20: 30
Add 5 to 20: 25
Add 10 to 20: 30
Add 5 to 20: 25
在这个例子中,闭包是函数内部定义的匿名函数makeAdder,它返回。闭包函数中引用的makeAdder变量绑定到发送给的参数。该函数被调用两次,一次使用参数,一次使用参数。每次调用都会返回不同版本的闭包,因为它们绑定到不同的值。第一次调用创建一个添加到其参数的函数;第二次调用创建一个添加到其参数的函数。这两个版本的函数绑定到不同的激活。显然,版本的生命周期xmakeAddermakeAdder105xmakeAdder105makeAdderx调用时创建的makeAdder必须延续整个程序的生命周期。
In this example, the closure is the anonymous function defined inside the makeAdder function, which makeAdder returns. The variable x referenced in the closure function is bound to the parameter that was sent to makeAdder. The makeAdder function is called twice, once with a parameter of 10 and once with 5. Each of these calls returns a different version of the closure because they are bound to different values of x. The first call to makeAdder creates a function that adds 10 to its parameter; the second creates a function that adds 5 to its parameter. The two versions of the function are bound to different activations of makeAdder. Obviously, the lifetime of the version of x created when makeAdder is called must extend over the lifetime of the program.
可以使用嵌套匿名委托在 C# 中编写相同的闭包函数。嵌套方法的类型指定为以作为int参数并返回匿名委托的函数。返回类型使用此类委托的特殊符号指定,Func<int, int>。尖括号中的第一个类型是参数类型。这样的委托可以封装只有一个参数的方法。第二种类型是委托封装的方法的返回类型。
This same closure function can be written in C# using a nested anonymous delegate. The type of the nesting method is specified to be a function that takes an int as a parameter and returns an anonymous delegate. The return type is specified with the special notation for such delegates, Func<int, int>. The first type in the angle brackets is the parameter type. Such a delegate can encapsulate methods that have only one parameter. The second type is the return type of the method encapsulated by the delegate.
static Func<int, int> makeAdder(int x) {
return delegate(int y) { return x + y;};
}
. . .
Func<int, int> Add10 = makeAdder(10);
Func<int, int> Add5 = makeAdder(5);
Console.WriteLine("Add 10 to 20: {0}", Add10(20));
Console.WriteLine("Add 5 to 20: {0}", Add5(20));
static Func<int, int> makeAdder(int x) {
return delegate(int y) { return x + y;};
}
. . .
Func<int, int> Add10 = makeAdder(10);
Func<int, int> Add5 = makeAdder(5);
Console.WriteLine("Add 10 to 20: {0}", Add10(20));
Console.WriteLine("Add 5 to 20: {0}", Add5(20));
该代码的输出与前面的 JavaScript 闭包示例完全相同。
The output of this code is exactly the same as for the previous JavaScript closure example.
匿名委托可以写成 lambda 表达式。下面是方法主体的替换makeAdder,使用 lambda 表达式代替委托:
The anonymous delegate could have been written as a lambda expression. The following is a replacement for the body of the makeAdder method, using a lambda expression instead of the delegate:
return y => x + yreturn y => x + y
Ruby 的块被实现为,它们可以引用在定义它们的位置可见的变量,即使它们是在这些变量已经消失的地方被调用的。这使得这样的块成为闭包。
Ruby’s blocks are implemented so that they can reference variables visible in the position in which they were defined, even if they are called at a place in which those variables would have disappeared. This makes such blocks closures.
协程是一种特殊的子程序。与传统子程序中调用者和被调用者之间的主从关系不同,协程中的调用者和被调用者的关系更为平等。事实上,协程控制机制通常被称为对称单元控制模型。
A coroutine is a special kind of subprogram. Rather than the master-slave relationship between a caller and a called subprogram that exists with conventional subprograms, caller and called coroutines are more equitable. In fact, the coroutine control mechanism is often called the symmetric unit control model.
协程可以有多个入口点,这些入口点由协程本身控制。它们还具有在激活之间保持其状态的方法。这意味着协程必须对历史记录敏感,因此具有静态局部变量。协程的二次执行通常从其开始点以外的点开始。因此,协程的调用称为恢复,而不是调用。
Coroutines can have multiple entry points, which are controlled by the coroutines themselves. They also have the means to maintain their status between activations. This means that coroutines must be history sensitive and thus have static local variables. Secondary executions of a coroutine often begin at points other than its beginning. Because of this, the invocation of a coroutine is called a resume rather than a call.
例如,考虑以下骨架协同程序:
For example, consider the following skeletal coroutine:
sub co1(){
. . .
resume co2();
. . .
resume co3();
. . .
}
sub co1(){
. . .
resume co2();
. . .
resume co3();
. . .
}
第一次co1恢复时,执行从第一条语句开始,一直执行到 ,包括 的恢复co2,从而将控制权移交给co2。下一次co1恢复时,执行从调用 之后的第一条语句开始co2。co1第三次恢复时,执行从 的恢复之后的第一条语句开始co3。
The first time co1 is resumed, its execution begins at the first statement and executes down to and including the resume of co2, which transfers control to co2. The next time co1 is resumed, its execution begins at the first statement after its call to co2. When co1 is resumed the third time, its execution begins at the first statement after the resume of co3.
协程保留了子程序的通常特征之一:在给定时间内实际上只有一个协程在执行。
One of the usual characteristics of subprograms is maintained in coroutines: Only one coroutine is actually in execution at a given time.
如上例所示,协程通常不会执行到最后,而是先执行一部分,然后将控制权移交给其他协程。重新启动时,协程会在它用来将控制权移交给其他程序的语句之后立即恢复执行。这种交错执行顺序与多道程序操作系统的工作方式有关。尽管可能只有一个处理器,但此类系统中所有执行程序似乎都在共享处理器的同时运行。对于协程而言,这有时称为准并发。
As seen in the example above, rather than executing to its end, a coroutine often partially executes and then transfers control to some other coroutine, and when restarted, a coroutine resumes execution just after the statement it used to transfer control elsewhere. This sort of interleaved execution sequence is related to the way multiprogramming operating systems work. Although there may be only one processor, all of the executing programs in such a system appear to run concurrently while sharing the processor. In the case of coroutines, this is sometimes called quasi-concurrency.
通常,协程是由应用程序中称为主单元的程序单元创建的,主单元不是协程。创建后,协程会执行其初始化代码,然后将控制权返回给该主单元。构建整个协程系列后,主程序会恢复其中一个协程,然后协程系列的成员会按某种顺序相互恢复,直到它们的工作完成(如果实际上可以完成)。如果协程的执行到达其代码段的末尾,则控制权将转移到创建它的主单元。这是在需要时结束协程集合执行的机制。在某些程序中,只要计算机正在运行,协程就会运行。
Typically, coroutines are created in an application by a program unit called the master unit, which is not a coroutine. When created, coroutines execute their initialization code and then return control to that master unit. When the entire family of coroutines is constructed, the master program resumes one of the coroutines, and the members of the family of coroutines then resume each other in some order until their work is completed, if in fact it can be completed. If the execution of a coroutine reaches the end of its code section, control is transferred to the master unit that created it. This is the mechanism for ending execution of the collection of coroutines, when that is desirable. In some programs, the coroutines run whenever the computer is running.
可以使用此类协程集合解决的问题的一个例子是纸牌游戏模拟。假设游戏中有四个玩家,他们都使用相同的策略。可以通过让主程序单元创建一个由四个协程组成的系列来模拟这种游戏,每个协程都有一组或一手牌。然后,主程序可以通过恢复其中一个玩家协程来开始模拟,该玩家协程在轮到自己玩完后可以恢复下一个玩家协程,依此类推,直到游戏结束。
One example of a problem that can be solved with this sort of collection of coroutines is a card game simulation. Suppose the game has four players who all use the same strategy. Such a game can be simulated by having a master program unit create a family of four coroutines, each with a collection, or hand, of cards. The master program could then start the simulation by resuming one of the player coroutines, which, after it had played its turn, could resume the next player coroutine, and so forth until the game ended.
假设程序单元A和B是协同程序。图 9.3A显示了涉及和的执行序列B可能进行的两种方式。
Suppose program units A and B are coroutines. Figure 9.3 shows two ways an execution sequence involving A and B might proceed.
在图 9.3a中,协程的执行A由主单元启动。执行一段时间后,A启动B。当图9.3aB中的协程首次引起 控制权返回到协程A,语义是协程A从上次结束的位置继续执行。特别是,它的局部变量具有上次激活留下的值。图 9.3bA显示了协程和的另一种执行序列B。在这种情况下,B由主单元启动。
In Figure 9.3a, the execution of coroutine A is started by the master unit. After some execution, A starts B. When coroutine B in Figure 9.3a first causes control to return to coroutine A, the semantics is that A continues from where it ended its last execution. In particular, its local variables have the values left them by the previous activation. Figure 9.3b shows an alternative execution sequence of coroutines A and B. In this case, B is started by the master unit.
协程通常不会采用图 9.3所示的模式,而是采用包含恢复的循环。图 9.4显示了此场景的执行顺序。在本例中,A由主单元启动。在其主循环中,A恢复,而后者又在其主循环中B恢复。A
Rather than have the patterns shown in Figure 9.3, a coroutine often has a loop containing a resume. Figure 9.4 shows the execution sequence of this scenario. In this case, A is started by the master unit. Inside its main loop, A resumes B, which in turn resumes A in its main loop.
Python 的生成器是协程的一种形式。13
The generators of Python are a form of coroutines.13
在编程语言中,进程抽象由子程序表示。子程序定义描述了子程序所表示的操作。子程序调用执行这些操作。子程序头标识子程序定义并提供其接口,这称为其协议。
Process abstractions are represented in programming languages by subprograms. A subprogram definition describes the actions represented by the subprogram. A subprogram call enacts those actions. A subprogram header identifies a subprogram definition and provides its interface, which is called its protocol.
形式参数是子程序用来指代子程序调用中给出的实际参数的名称。在 Python 和 Ruby 中,数组和哈希形式参数用于支持可变数量的参数。JavaScript 也支持可变数量的参数。实际参数可以通过位置或关键字与形式参数相关联。参数可以具有默认值。
Formal parameters are the names that subprograms use to refer to the actual parameters given in subprogram calls. In Python and Ruby, array and hash formal parameters are used to support variable numbers of parameters. JavaScript also supports variable numbers of parameters. Actual parameters can be associated with formal parameters by position or by keyword. Parameters can have default values.
子程序可以是函数(模拟数学函数并用于定义新的操作),也可以是过程(定义新的语句)。
Subprograms can be either functions, which model mathematical functions and are used to define new operations, or procedures, which define new statements.
子程序中的局部变量可以是堆栈动态的,提供对递归的支持,也可以是静态的,提供效率和历史敏感的局部变量。
Local variables in subprograms can be stack dynamic, providing support for recursion, or static, providing efficiency and history-sensitive local variables.
JavaScript、Python、Ruby 和 Swift 允许嵌套子程序定义。
JavaScript, Python, Ruby, and Swift allow subprogram definitions to be nested.
参数传递有三种基本语义模型:输入模式、输出模式和输入输出模式,以及多种实现方法。这些方法包括按值传递、按结果传递、按值结果传递、按引用传递和按名称传递。在大多数语言中,参数都是在运行时堆栈中传递的。
There are three fundamental semantics models of parameter passing—in mode, out mode, and inout mode—and a number of approaches to implement them. These are pass-by-value, pass-by-result, pass-by-value-result, pass-by-reference, and pass-by-name. In most languages, parameters are passed in the run-time stack.
当使用传递引用参数时,可能会发生别名,无论是在两个或多个参数之间,还是在一个参数和可访问的非局部变量之间。
Aliasing can occur when pass-by-reference parameters are used, both among two or more parameters and between a parameter and an accessible nonlocal variable.
多维数组的参数给语言设计者带来了一些问题,因为被调用的子程序需要知道如何计算它们的存储映射函数。这需要的不仅仅是数组的名称。
Parameters that are multidimensioned arrays pose some issues for the language designer, because the called subprogram needs to know how to compute the storage mapping function for them. This requires more than just the name of the array.
作为子程序名称的参数提供了必要的服务,但可能难以理解。不透明性在于执行作为参数传递的子程序时可用的引用环境。
Parameters that are subprogram names provide a necessary service but can be difficult to understand. The opacity lies in the referencing environment that is available when a subprogram that has been passed as a parameter is executed.
C 和 C++ 支持指向函数的指针。C# 具有委托,它们是可以存储对方法的引用的对象。委托可以通过存储多个方法引用来支持多播调用。
C and C++ support pointers to functions. C# has delegates, which are objects that can store references to methods. Delegates can support multicast calls by storing more than one method reference.
Ada、C++、C#、Ruby 和 Python 都允许子程序和运算符重载。只要可以通过参数或返回值的类型区分不同版本,就可以重载子程序。函数定义可用于为运算符构建附加含义。
Ada, C++, C#, Ruby, and Python allow both subprogram and operator overloading. Subprograms can be overloaded as long as the various versions can be disambiguated by the types of their parameters or returned values. Function definitions can be used to build additional meanings for operators.
C++、Java 5.0 和 C# 2005 中的子程序可以是通用的,使用参数多态性,因此其数据对象的所需类型可以传递给编译器,然后编译器可以为请求的类型构建单元。
Subprograms in C++, Java 5.0, and C# 2005 can be generic, using parametric polymorphism, so the desired types of their data objects can be passed to the compiler, which then can construct units for the requested types.
语言中函数功能的设计者必须决定对返回值以及返回值的数量施加哪些限制。
The designer of a function facility in a language must decide what restrictions will be placed on the returned values, as well as the number of return values.
闭包是子程序及其引用环境。闭包在允许嵌套子程序、静态作用域以及允许从函数返回子程序并将其赋值给变量的语言中非常有用。
A closure is a subprogram and its referencing environment. Closures are useful in languages that allow nested subprograms, are static-scoped, and allow subprograms to be returned from functions and assigned to variables.
协程是一种具有多个入口的特殊子程序。它可用于提供子程序的交错执行。
A coroutine is a special subprogram that has multiple entries. It can be used to provide interleaved execution of subprograms.
子程序有哪三个一般特征?
What are the three general characteristics of subprograms?
子程序处于活跃状态意味着什么?
What does it mean for a subprogram to be active?
子程序的标题中给出了什么?
What is given in the header of a subprogram?
Python 子程序的哪些特点使其有别于其他语言的子程序?
What characteristic of Python subprograms sets them apart from those of other languages?
哪些语言允许可变数量的参数?
What languages allow a variable number of parameters?
什么是 Ruby 数组形式参数?
What is a Ruby array formal parameter?
什么是参数配置文件?什么是子程序协议?
What is a parameter profile? What is a subprogram protocol?
什么是形式参数?什么是实际参数?
What are formal parameters? What are actual parameters?
关键字参数的优点和缺点是什么?
What are the advantages and disadvantages of keyword parameters?
函数和过程之间有什么区别?
What are the differences between a function and a procedure?
子程序的设计问题是什么?
What are the design issues for subprograms?
动态局部变量的优点和缺点是什么?
What are the advantages and disadvantages of dynamic local variables?
静态局部变量有哪些优点和缺点?
What are the advantages and disadvantages of static local variables?
哪些语言允许子程序定义嵌套?
What languages allow subprogram definitions to be nested?
参数传递的三种语义模型是什么?
What are the three semantics models of parameter passing?
按值传递、按结果传递、按值结果传递、按引用传递参数传递方式的模式、传递的概念模型、优点和缺点是什么?
What are the modes, the conceptual models of transfer, the advantages, and the disadvantages of pass-by-value, pass-by-result, pass-by-value-result, and pass-by-reference parameter-passing methods?
描述别名通过传递引用参数出现的方式。
Describe the ways that aliases can occur with pass-by-reference parameters.
原始 C 和 C89 处理实际参数的类型与相应形式参数的类型不一致的方式有何区别?
What is the difference between the way original C and C89 deal with an actual parameter whose type is not identical to that of the corresponding formal parameter?
参数传递方法的两个基本设计考虑是什么?
What are two fundamental design considerations for parameter-passing methods?
描述传递多维数组作为参数的问题。
Describe the problem of passing multidimensioned arrays as parameters.
Ruby 中使用的参数传递方法名称是什么?
What is the name of the parameter-passing method used in Ruby?
当子程序名称作为参数时会出现哪两个问题?
What are the two issues that arise when subprogram names are parameters?
为作为参数传递的子程序的引用环境定义浅绑定和深绑定。
Define shallow and deep binding for referencing environments of subprograms that have been passed as parameters.
什么是重载子程序?
What is an overloaded subprogram?
什么是参数多态性?
What is parametric polymorphism?
什么原因导致 C++ 模板函数被实例化?
What causes a C++ template function to be instantiated?
Java 5.0 泛型方法的泛型参数与 C++ 方法的泛型参数有哪些基本区别?
In what fundamental ways do the generic parameters to a Java 5.0 generic method differ from those of C++ methods?
如果 Java 5.0 方法返回泛型类型,那么实际上返回什么类型的对象?
If a Java 5.0 method returns a generic type, what type of object is actually returned?
如果使用三个不同的泛型参数调用 Java 5.0 泛型方法,编译器将生成该方法的多少个版本?
If a Java 5.0 generic method is called with three different generic parameters, how many versions of the method will be generated by the compiler?
功能设计方面存在哪些问题?
What are the design issues for functions?
说出两种允许从函数返回多个值的语言。
Name two languages that allow multiple values to be returned from a function.
委托到底是什么?
What exactly is a delegate?
F# 中泛型函数的主要缺点是什么?
What is the main drawback of generic functions in F#?
什么是闭包?
What is a closure?
哪些语言特性使得闭包有用?
What are the language characteristics that make closures useful?
哪些语言允许用户重载运算符?
What languages allow the user to overload operators?
协程与传统子程序有何不同?
In what ways are coroutines different from conventional subprograms?
赞成和反对用户程序为现有运算符构建额外定义的论据是什么,就像在 Python 和 C++ 中所做的那样?你认为这种用户定义的运算符重载是好还是坏?支持你的答案。
What are arguments for and against a user program building additional definitions for existing operators, as can be done in Python and C++? Do you think such user-defined operator overloading is good or bad? Support your answer.
在大多数 Fortran IV 实现中,所有参数都是通过引用传递的,仅使用访问路径传输。请说明这种设计选择的优点和缺点。
In most Fortran IV implementations, all parameters were passed by reference, using access path transmission only. State both the advantages and disadvantages of this design choice.
支持 Ada 83 设计者的决定,允许实现者inout-通过复制或引用在实现模式参数之间进行选择。
Argue in support of the Ada 83 designers’ decision to allow the implementor to choose between implementing inout-mode parameters by copy or by reference.
假设您要编写一个方法,在新的输出页面上打印标题,以及页码(第一次激活时为 1,每次后续激活时增加 1)。这可以在没有参数和不引用非局部变量的情况下在 Java 中完成吗?可以在 C# 中完成吗?
Suppose you want to write a method that prints a heading on a new output page, along with a page number that is 1 in the first activation and that increases by 1 with each subsequent activation. Can this be done without parameters and without reference to nonlocal variables in Java? Can it be done in C#?
考虑以下用 C 语法编写的程序:
Consider the following program written in C syntax:
void swap(int a, int b) {
int temp;
temp = a;
a = b;
b = temp;
}
void main() {
int value = 2, list[5] = {1, 3, 5, 7, 9};
swap(value, list[0]);
swap(list[0], list[1]);
swap(value, list[value]);
}
void swap(int a, int b) {
int temp;
temp = a;
a = b;
b = temp;
}
void main() {
int value = 2, list[5] = {1, 3, 5, 7, 9};
swap(value, list[0]);
swap(list[0], list[1]);
swap(value, list[value]);
}
对于下列每个参数传递方法,在三次调用之后,变量value和的所有值分别是什么?listswap
For each of the following parameter-passing methods, what are all of the values of the variables value and list after each of the three calls to swap?
按值传递
Passed by value
通过引用传递
Passed by reference
按值结果传递
Passed by value-result
提出一个反对在子程序中提供静态和动态局部变量的论点。
Present one argument against providing both static and dynamic local variables in subprograms.
考虑以下用 C 语法编写的程序:
Consider the following program written in C syntax:
void fun (int first, int second) {
first += first;
second += second;
}
void main() {
int list[2] = {1, 3};
fun(list[0], list[1]);
}
void fun (int first, int second) {
first += first;
second += second;
}
void main() {
int list[2] = {1, 3};
fun(list[0], list[1]);
}
对于下列每个参数传递方法,list执行后数组的值是多少?
For each of the following parameter-passing methods, what are the values of the list array after execution?
按值传递
Passed by value
通过引用传递
Passed by reference
按值结果传递
Passed by value-result
反对仅提供函数子程序的 C 设计。
Argue against the C design of providing only function subprograms.
从 Fortran 教科书中学习语句函数的语法和语义。证明它们在 Fortran 中的存在。
From a textbook on Fortran, learn the syntax and semantics of statement functions. Justify their existence in Fortran.
研究 C++ 和 Ada 中用户定义的运算符重载的方法,并根据我们评估语言的标准撰写报告,对两者进行比较。
Study the methods of user-defined operator overloading in C++ and Ada, and write a report comparing the two using our criteria for evaluating languages.
C# 支持 out-mode 参数,但 Java 和 C++ 都不支持。解释一下两者的区别。
C# supports out-mode parameters, but neither Java nor C++ does. Explain the difference.
研究 Jensen 的设备,它是通过名称传递参数的一种用法,并简要描述它是什么以及如何使用它。
Research Jensen’s Device, which was a use of pass-by-name parameters, and write a short description of what it is and how it can be used.
研究Ruby和CLU的迭代器机制并列出它们的相同点和不同点。
Study the iterator mechanisms of Ruby and CLU and list their similarities and differences.
推测编程语言中允许嵌套子程序的问题——为什么许多当代语言不允许它们?
Speculate on the issue of allowing nested subprograms in programming languages—why are they not allowed in many contemporary languages?
反对使用按名称传递参数的至少两个论点是什么?
What are at least two arguments against the use of pass-by-name parameters?
写一份Java 5.0和C# 2005的泛型子程序的详细比较。
Write a detailed comparison of the generic subprograms of Java 5.0 and C# 2005.
用您熟悉的语言编写程序,确定通过引用传递大型数组所需的时间与通过值传递相同数组所需的时间之比。在您使用的机器和实现上,使数组尽可能大。根据需要多次传递数组,以获得传递操作的合理准确时间。
Write a program in a language that you know to determine the ratio of the time required to pass a large array by reference and the time required to pass the same array by value. Make the array as large as possible on the machine and implementation you use. Pass the array as many times as necessary to get reasonably accurate timings of the passing operations.
编写一个 C# 或 Ada 程序,确定何时计算输出模式参数的地址(在调用时或子程序执行完成时)。
Write a C# or Ada program that determines when the address of an out-mode parameter is computed (at the time of the call or at the time the execution of the subprogram finishes).
编写一个 Perl 程序,通过引用将文字传递给子程序,该子程序试图更改参数。根据 Perl 的总体设计理念,解释结果。
Write a Perl program that passes by reference a literal to a subprogram, which attempts to change the parameter. Given the overall design philosophy of Perl, explain the results.
重复 C# 中的编程练习 3。
Repeat Programming Exercise 3 in C#.
用某种语言编写一个程序,在子程序中既有静态局部变量,也有堆栈动态局部变量。创建六个大的(至少 ) 矩阵——三个静态矩阵和三个堆栈动态矩阵。用 1 到 100 范围内的随机数填充两个静态矩阵和两个堆栈动态矩阵。子程序中的代码必须对静态矩阵执行大量矩阵乘法运算并计时该过程。然后,它必须对堆栈动态矩阵重复此操作。比较并解释结果。
Write a program in some language that has both static and stack-dynamic local variables in subprograms. Create six large (at least ) matrices in the subprogram—three static and three stack dynamic. Fill two of the static matrices and two of the stack-dynamic matrices with random numbers in the range of 1 to 100. The code in the subprogram must perform a large number of matrix multiplication operations on the static matrices and time the process. Then it must repeat this with the stack-dynamic matrices. Compare and explain the results.
编写一个 C# 程序,其中包含两个被调用多次的方法。这两个方法都传递了一个大数组,一个通过值传递,一个通过引用传递。比较调用这两个方法所需的时间并解释差异。确保调用它们的次数足够多,以说明所需时间的差异。
Write a C# program that includes two methods that are called a large number of times. Both methods are passed a large array, one by value and one by reference. Compare the times required to call these two methods and explain the difference. Be sure to call them a sufficient number of times to illustrate a difference in the required time.
编写一个程序,使用您喜欢的任何语言的语法,根据在参数传递中使用按引用传递还是按值结果传递来产生不同的行为。
Write a program, using the syntax of whatever language you like, that produces different behavior depending on whether pass-by-reference or pass-by-value-result is used in its parameter passing.
编写一个通用 C++ 函数,该函数接受一个通用元素数组和一个与数组元素类型相同的标量。数组元素和标量的类型是通用参数。该函数必须在给定的数组中搜索给定的标量,并返回数组中标量的下标。如果标量不在数组中,则该函数必须返回-1。测试函数的int和float类型。
Write a generic C++ function that takes an array of generic elements and a scalar of the same type as the array elements. The type of the array elements and the scalar is the generic parameter. The function must search the given array for the given scalar and return the subscript of the scalar in the array. If the scalar is not in the array, the function must return -1. Test the function for int and float types.
设计一个子程序并调用代码,其中一个或多个参数的按引用传递和按值结果传递会产生不同的结果。
Devise a subprogram and calling code in which pass-by-reference and pass-by-value-result of one or more parameters produces different results.
本章的目的是探讨子程序的实现。本章的讨论将为读者提供一些关于子程序链接工作原理的知识,以及为什么 ALGOL 60 对 20 世纪 60 年代早期毫无戒心的编译器编写者来说是一个挑战。我们从最简单的情况开始,即具有静态局部变量的非嵌套子程序,然后介绍具有堆栈动态局部变量的更复杂的子程序,最后介绍具有堆栈动态局部变量和静态作用域的嵌套子程序。在具有嵌套子程序的语言中实现子程序的难度增加是因为需要包含访问非局部变量的机制。
The purpose of this chapter is to explore the implementation of subprograms. The discussion will provide the reader with some knowledge of how subprogram linkage works, and also why ALGOL 60 was a challenge to the unsuspecting compiler writers of the early 1960s. We begin with the simplest situation, nonnestable subprograms with static local variables, advance to more complicated subprograms with stack-dynamic local variables, and conclude with nested subprograms with stack-dynamic local variables and static scoping. The increased difficulty of implementing subprograms in languages with nested subprograms is caused by the need to include mechanisms to access nonlocal variables.
详细讨论了静态作用域语言中访问非局部变量的静态链方法。然后,描述了实现块的技术。最后,讨论了在动态作用域语言中实现非局部变量访问的几种方法。
The static chain method of accessing nonlocals in static-scoped languages is discussed in detail. Then, techniques for implementing blocks are described. Finally, several methods of implementing nonlocal variable access in a dynamic-scoped language are discussed.
子程序的调用和返回操作统称为子程序链接。子程序的实现必须基于所实现语言的子程序链接语义。
The subprogram call and return operations are together called subprogram linkage. The implementation of subprograms must be based on the semantics of the subprogram linkage of the language being implemented.
典型语言中的子程序调用有许多相关操作。调用过程必须包括所使用的参数传递方法的实现。如果局部变量不是静态的,则调用过程必须为被调用子程序中声明的局部变量分配存储空间,并将这些变量绑定到该存储空间。它必须保存调用程序单元的执行状态。执行状态是恢复调用程序单元执行所需的一切。这包括寄存器值、CPU 状态位和环境指针 (EP)。EP(将在第10.3节 中进一步讨论)用于在子程序执行期间访问参数和局部变量。调用过程还必须安排将控制权转移到子程序的代码,并确保在子程序执行完成后控制权可以返回到正确的位置。最后,如果语言支持嵌套子程序,则调用过程必须创建某种机制来提供对被调用子程序可见的非局部变量的访问。
A subprogram call in a typical language has numerous actions associated with it. The call process must include the implementation of whatever parameter-passing method is used. If local variables are not static, the call process must allocate storage for the locals declared in the called subprogram and bind those variables to that storage. It must save the execution status of the calling program unit. The execution status is everything needed to resume execution of the calling program unit. This includes register values, CPU status bits, and the environment pointer (EP). The EP, which is discussed further in Section 10.3, is used to access parameters and local variables during the execution of a subprogram. The calling process also must arrange to transfer control to the code of the subprogram and ensure that control can return to the proper place when the subprogram execution is completed. Finally, if the language supports nested subprograms, the call process must create some mechanism to provide access to nonlocal variables that are visible to the called subprogram.
子程序返回所需的操作比调用的操作简单。如果子程序具有 out 模式或 inout 模式的参数,并且通过复制实现,则返回过程的第一个操作是将相关形式参数的本地值移动到实际参数。接下来,它必须释放用于局部变量的存储并恢复调用程序单元的执行状态。最后,必须将控制权返回给调用程序单元。
The required actions of a subprogram return are less complicated than those of a call. If the subprogram has parameters that are out mode or inout mode and are implemented by copy, the first action of the return process is to move the local values of the associated formal parameters to the actual parameters. Next, it must deallocate the storage used for local variables and restore the execution status of the calling program unit. Finally, control must be returned to the calling program unit.
我们首先从实现简单子程序开始。“简单”是指子程序不能嵌套,所有局部变量都是静态的。Fortran 的早期版本就是具有此类子程序的语言的例子。
We begin with the task of implementing simple subprograms. By “simple” we mean that subprograms cannot be nested and all local variables are static. Early versions of Fortran were examples of languages that had this kind of subprograms.
调用“简单”子程序的语义需要执行以下操作:
The semantics of a call to a “simple” subprogram requires the following actions:
保存当前程序单元的执行状态。
Save the execution status of the current program unit.
计算并传递参数。
Compute and pass the parameters.
将返回地址传递给被调用者。
Pass the return address to the called.
将控制权转交给被调用者。
Transfer control to the called.
从简单子程序返回的语义需要执行以下操作:
The semantics of a return from a simple subprogram requires the following actions:
如果有传递值结果或输出模式参数,则这些参数的当前值将移动到或提供给相应的实际参数。
If there are pass-by-value-result or out-mode parameters, the current values of those parameters are moved to or made available to the corresponding actual parameters.
如果子程序是一个函数,则函数值将被移动到调用者可访问的位置。
If the subprogram is a function, the functional value is moved to a place accessible to the caller.
调用者的执行状态被恢复。
The execution status of the caller is restored.
控制权被转回给调用者。
Control is transferred back to the caller.
调用和返回操作需要存储以下内容:
The call and return actions require storage for the following:
呼叫者的状态信息
Status information about the caller
参数
Parameters
返回地址
Return address
函数的返回值
Return value for functions
子程序代码使用的临时变量
Temporaries used by the code of the subprograms
这些与局部变量和子程序代码一起构成了子程序执行并将控制权返回给调用者所需信息的完整集合。
These, along with the local variables and the subprogram code, form the complete collection of information a subprogram needs to execute and then return control to the caller.
现在的问题是将调用和返回操作分配给调用者和被调用者。对于简单的子程序,对于流程的大部分部分,答案是显而易见的。调用的最后三个操作显然必须由调用者完成。保存调用者的执行状态可以由两者之一完成。在返回的情况下,第一、第三和第四个操作必须由被调用者完成。同样,恢复调用者的执行状态可以由调用者或被调用者完成。通常,被调用者的链接操作可以在两个不同的时间发生,要么在其执行开始时,要么在其执行结束时。这些有时被称为子程序链接的序言和结语。对于简单的子程序,被调用者的所有链接操作都发生在其执行结束时,因此不需要序言。
The question now is the distribution of the call and return actions to the caller and the called. For simple subprograms, the answer is obvious for most of the parts of the process. The last three actions of a call clearly must be done by the caller. Saving the execution status of the caller could be done by either. In the case of the return, the first, third, and fourth actions must be done by the called. Once again, the restoration of the execution status of the caller could be done by either the caller or the called. In general, the linkage actions of the called can occur at two different times, either at the beginning of its execution or at the end. These are sometimes called the prologue and epilogue of the subprogram linkage. In the case of a simple subprogram, all of the linkage actions of the callee occur at the end of its execution, so there is no need for a prologue.
简单子程序由两个独立的部分组成:子程序的实际代码(常量)以及前面列出的局部变量和数据(在执行子程序时可能会发生变化)。对于简单子程序,这两个部分的大小都是固定的。
A simple subprogram consists of two separate parts: the actual code of the subprogram, which is constant, and the local variables and data listed previously, which can change when the subprogram is executed. In the case of simple subprograms, both of these parts have fixed sizes.
子程序非代码部分的格式或布局称为激活记录,因为它描述的数据仅在子程序激活或执行期间相关。 激活记录的形式是静态的。激活记录实例是激活记录的具体示例,是激活记录形式的数据集合。
The format, or layout, of the noncode part of a subprogram is called an activation record, because the data it describes are relevant only during the activation or execution of the subprogram. The form of an activation record is static. An activation record instance is a concrete example of an activation record, a collection of data in the form of an activation record.
由于具有简单子程序的语言不支持递归,因此给定子程序一次只能有一个活动版本。因此,子程序的活动记录只能有一个实例。图 10.1显示了活动记录的一种可能布局。此处以及本章的其余部分省略了调用方的保存执行状态,因为它很简单并且与讨论无关。
Because languages with simple subprograms do not support recursion, there can be only one active version of a given subprogram at a time. Therefore, there can be only a single instance of the activation record for a subprogram. One possible layout for activation records is shown in Figure 10.1. The saved execution status of the caller is omitted here and in the remainder of this chapter because it is simple and not relevant to the discussion.
因为“简单”子程序的激活记录实例具有固定大小,所以可以静态分配。事实上,它可以附加到子程序的代码部分。
Because an activation record instance for a “simple” subprogram has fixed size, it can be statically allocated. In fact, it could be attached to the code part of the subprogram.
图 10.2显示了一个由主程序和三个子程序组成的程序:A、B和C。尽管该图显示了与所有激活记录实例分开的所有代码段,但在某些情况下,激活记录实例会附加到其关联的代码段。
Figure 10.2 shows a program consisting of a main program and three subprograms: A, B, and C. Although the figure shows all the code segments separated from all the activation record instances, in some cases, the activation record instances are attached to their associated code segments.
图 10.2中的完整程序的构造并非完全由编译器完成。事实上,如果语言允许独立编译,那么四个程序单元MAIN、A、B和C可能是在不同的日子甚至不同的年份编译的。在编译每个单元时,它的机器码以及对外部子程序的引用列表都会写入一个文件。图 10.2中的可执行程序由链接器(操作系统的一部分)组合在一起。(有时链接器也称为加载器、链接器/加载器或链接编辑器。)当为主程序调用链接器时,其首要任务是找到包含该程序中引用的已翻译子程序的文件并将其加载到内存中。然后,链接器必须将主程序中对这些子程序的所有调用的目标地址设置为这些子程序的入口地址。对已加载子程序中的所有子程序调用以及对库子程序的所有调用都必须执行相同的操作。在上例中,链接器被调用来执行MAIN。链接器必须找到A、B和的机器代码程序C及其激活记录实例,并将它们与 的代码一起加载到内存中MAIN。然后,它必须修补所有对A、B、的调用以及在、、和C中调用的任何库子程序的目标地址。ABCMAIN
The construction of the complete program shown in Figure 10.2 is not done entirely by the compiler. In fact, if the language allows independent compilation, the four program units—MAIN, A, B, and C—may have been compiled on different days, or even in different years. At the time each unit is compiled, the machine code for it, along with a list of references to external subprograms, is written to a file. The executable program shown in Figure 10.2 is put together by the linker, which is part of the operating system. (Sometimes linkers are called loaders, linker/loaders, or link editors.) When the linker is called for a main program, its first task is to find the files that contain the translated subprograms referenced in that program and load them into memory. Then, the linker must set the target addresses of all calls to those subprograms in the main program to the entry addresses of those subprograms. The same must be done for all calls to subprograms in the loaded subprograms and all calls to library subprograms. In the previous example, the linker was called for MAIN. The linker had to find the machine code programs for A, B, and C, along with their activation record instances, and load them into memory with the code for MAIN. Then, it had to patch in the target addresses for all calls to A, B, C, and any library subprograms called in A, B, C, and MAIN.
我们现在研究局部变量为堆栈动态的语言中子程序链接的实现,再次关注调用和返回操作。
We now examine the implementation of the subprogram linkage in languages in which locals are stack dynamic, again focusing on the call and return operations.
堆栈动态局部变量最重要的优点之一是支持递归。因此,使用堆栈动态局部变量的语言也支持递归。
One of the most important advantages of stack-dynamic local variables is support for recursion. Therefore, languages that use stack-dynamic local variables also support recursion.
有关子程序可以嵌套时所需的额外复杂性的讨论将推迟到第10.4节 。
A discussion of the additional complexity required when subprograms can be nested is postponed until Section 10.4.
使用堆栈动态局部变量的语言中的子程序链接比简单子程序的链接更复杂,原因如下:
Subprogram linkage in languages that use stack-dynamic local variables are more complex than the linkage of simple subprograms for the following reasons:
编译器必须生成代码来导致局部变量的隐式分配和释放。
The compiler must generate code to cause the implicit allocation and deallocation of local variables.
递归增加了同时激活多个子程序的可能性,这意味着在给定时间内可以有多个子程序实例(不完整执行),其中至少有一个来自子程序外部的调用和一个或多个递归调用。激活次数仅受机器内存大小的限制。每个激活都需要自己的激活记录实例。
Recursion adds the possibility of multiple simultaneous activations of a subprogram, which means that there can be more than one instance (incomplete execution) of a subprogram at a given time, with at least one call from outside the subprogram and one or more recursive calls. The number of activations is limited only by the memory size of the machine. Each activation requires its own activation record instance.
在大多数语言中,给定子程序的活动记录的格式在编译时是已知的。在许多情况下,活动记录的大小也是已知的,因为所有本地数据都是固定大小。在其他一些语言中情况并非如此,例如 Ada,其中本地数组的大小可能取决于实际参数的值。在这些情况下,格式是静态的,但大小可以是动态的。在具有堆栈动态局部变量的语言中,必须动态创建活动记录实例。这种语言的典型活动记录如图10.3 所示。
The format of an activation record for a given subprogram in most languages is known at compile time. In many cases, the size is also known for activation records because all local data are of a fixed size. That is not the case in some other languages, such as Ada, in which the size of a local array can depend on the value of an actual parameter. In those cases, the format is static, but the size can be dynamic. In languages with stack-dynamic local variables, activation record instances must be created dynamically. The typical activation record for such a language is shown in Figure 10.3.
由于返回地址、动态链接和参数由调用者放置在活动记录实例中,因此这些条目必须首先出现。
Because the return address, dynamic link, and parameters are placed in the activation record instance by the caller, these entries must appear first.
返回地址通常由指向调用程序单元代码段中调用之后的指令的指针组成。动态链接是指向调用者活动记录实例基址的指针。在静态作用域语言中,此链接用于在发生运行时错误时提供回溯信息。在动态作用域语言中,动态链接用于访问非局部变量。活动记录中的实际参数是调用者提供的值或地址。
The return address usually consists of a pointer to the instruction following the call in the code segment of the calling program unit. The dynamic link is a pointer to the base of the activation record instance of the caller. In static-scoped languages, this link is used to provide traceback information when a run-time error occurs. In dynamic-scoped languages, the dynamic link is used to access nonlocal variables. The actual parameters in the activation record are the values or addresses provided by the caller.
局部标量变量绑定到活动记录实例内的存储。结构体形式的局部变量有时会分配到其他地方,只有它们的描述符和指向该存储的指针才是活动记录的一部分。局部变量在被调用的子程序中分配并可能初始化,因此它们出现在最后。
Local scalar variables are bound to storage within an activation record instance. Local variables that are structures are sometimes allocated elsewhere, and only their descriptors and a pointer to that storage are part of the activation record. Local variables are allocated and possibly initialized in the called subprogram, so they appear last.
考虑以下骨架 C 函数:
Consider the following skeletal C function:
void sub(float total, int part) {
int list[5];
float sum;
. . .
}void sub(float total, int part) {
int list[5];
float sum;
. . .
}
The activation record for sub is shown in Figure 10.4.
subsub激活子程序需要动态创建子程序的活动记录实例。如前所述,活动记录的格式在编译时是固定的,尽管在某些语言中它的大小可能取决于调用。因为调用和返回语义指定最后调用的子程序是第一个完成的,所以在堆栈上创建这些活动记录的实例是合理的。这个堆栈是运行时系统的一部分,因此称为运行时堆栈,尽管我们通常只将其称为堆栈。每个子程序激活,无论是递归还是非递归,都会在堆栈上创建一个新的活动记录实例。这提供了参数、局部变量和返回地址所需的单独副本。
Activating a subprogram requires the dynamic creation of an instance of the activation record for the subprogram. As stated earlier, the format of the activation record is fixed at compile time, although its size may depend on the call in some languages. Because the call and return semantics specify that the subprogram last called is the first to complete, it is reasonable to create instances of these activation records on a stack. This stack is part of the run-time system and therefore is called the run-time stack, although we will usually just refer to it as the stack. Every subprogram activation, whether recursive or nonrecursive, creates a new instance of an activation record on the stack. This provides the required separate copies of the parameters, local variables, and return address.
要控制子程序的执行,还需要一件事 — EP。最初,EP 指向主程序的激活记录实例的基址或第一个地址。运行时系统必须确保它始终指向当前执行的程序单元的激活记录实例的基址。调用子程序时,当前 EP 将作为动态链接保存在新的激活记录实例中。然后将 EP 设置为指向新激活记录实例的基址。从子程序返回时,堆栈顶部设置为当前 EP 的值减一,并将 EP 设置为已完成执行的子程序的激活记录实例的动态链接。重置堆栈顶部会有效地删除顶部激活记录实例。
One more thing is required to control the execution of a subprogram—the EP. Initially, the EP points at the base, or first address of the activation record instance of the main program. The run-time system must ensure that it always points at the base of the activation record instance of the currently executing program unit. When a subprogram is called, the current EP is saved in the new activation record instance as the dynamic link. The EP is then set to point at the base of the new activation record instance. Upon return from the subprogram, the stack top is set to the value of the current EP minus one and the EP is set to the dynamic link from the activation record instance of the subprogram that has completed its execution. Resetting the stack top effectively removes the top activation record instance.
EP作为激活记录实例的数据内容——参数和局部变量的偏移寻址的基准。
The EP is used as the base of the offset addressing of the data contents of the activation record instance—parameters and local variables.
注意,当前正在使用的 EP 不存储在运行时堆栈中。只有已保存的版本作为动态链接存储在激活记录实例中。
Note that the EP currently being used is not stored in the run-time stack. Only saved versions are stored in the activation record instances as the dynamic links.
现在我们已经讨论了链接过程中的几个新操作。必须修改第 10.2节 中给出的列表以考虑这些操作。使用本节中给出的激活记录形式,新操作如下:
We have now discussed several new actions in the linkage process. The lists given in Section 10.2 must be revised to take these into account. Using the activation record form given in this section, the new actions are as follows:
调用者操作如下:
The caller actions are as follows:
创建一个激活记录实例。
Create an activation record instance.
保存当前程序单元的执行状态。
Save the execution status of the current program unit.
计算并传递参数。
Compute and pass the parameters.
将返回地址传递给被调用者。
Pass the return address to the called.
将控制权转交给被调用者。
Transfer control to the called.
被召唤者的序幕动作如下:
The prologue actions of the called are as follows:
将堆栈中的旧 EP 保存为动态链接并创建新值。
Save the old EP in the stack as the dynamic link and create the new value.
分配局部变量。
Allocate local variables.
被召唤者的结束动作如下:
The epilogue actions of the called are as follows:
如果有传递值结果或输出模式参数,则这些参数的当前值将移动到相应的实际参数。
If there are pass-by-value-result or out-mode parameters, the current values of those parameters are moved to the corresponding actual parameters.
如果子程序是一个函数,则函数值将被移动到调用者可访问的位置。
If the subprogram is a function, the functional value is moved to a place accessible to the caller.
通过将堆栈指针设置为当前 EP 的值减一来恢复堆栈指针,并将 EP 设置为旧的动态链接。
Restore the stack pointer by setting it to the value of the current EP minus one and set the EP to the old dynamic link.
恢复调用者的执行状态。
Restore the execution status of the caller.
将控制权转回给调用者。
Transfer control back to the caller.
回想一下第9章 ,子程序从被调用到执行完成期间一直处于活动状态。当它变为非活动状态时,其局部作用域不复存在,其引用环境也不再有意义。因此,此时其活动记录实例可以被销毁。
Recall from Chapter 9, that a subprogram is active from the time it is called until the time that execution is completed. At the time it becomes inactive, its local scope ceases to exist and its referencing environment is no longer meaningful. Therefore, at that time, its activation record instance can be destroyed.
参数并不总是在堆栈中传输。在许多 RISC 计算机的编译器中,参数在寄存器中传递。这是因为 RISC 计算机通常比 CISC 计算机具有更多的寄存器。然而,在本章的其余部分中,我们假设参数在堆栈。对于在寄存器中传递的参数,修改此方法非常简单。
Parameters are not always transferred in the stack. In many compilers for RISC machines, parameters are passed in registers. This is because RISC machines normally have many more registers than CISC machines. In the remainder of this chapter, however, we assume that parameters are passed in the stack. It is straightforward to modify this approach for parameters being passed in registers.
考虑以下骨架 C 程序:
Consider the following skeletal C program:
void fun1(float r) {
int s, t;
. . . <---------- 1
fun2(s);
. . .
}
void fun2(int x) {
int y;
. . . <---------- 2
fun3(y);
. .
}
void fun3(int q) {
. . . <---------- 3
}
void main() {
float p;
. . .
fun1(p);
. . .
}
void fun1(float r) {
int s, t;
. . . <---------- 1
fun2(s);
. . .
}
void fun2(int x) {
int y;
. . . <---------- 2
fun3(y);
. .
}
void fun3(int q) {
. . . <---------- 3
}
void main() {
float p;
. . .
fun1(p);
. . .
}
该程序中的函数调用顺序是
The sequence of function calls in this program is
main calls fun1
fun1 calls fun2
fun2 calls fun3
main calls fun1
fun1 calls fun2
fun2 calls fun3
图 10.5显示了标记为 1、2 和 3 的点的堆栈内容。
The stack contents for the points labeled 1, 2, and 3 are shown in Figure 10.5.
在点 1 处,堆栈上只有函数main和函数的激活记录实例。当调用时,会在堆栈上创建激活记录的一个实例。当调用时,会在堆栈上创建激活记录的一个实例。当执行结束时,其激活记录的实例会从堆栈中移除,并使用 EP 重置堆栈顶部指针。当函数和终止时也会发生类似的过程。从对的调用返回后,堆栈中只有fun1fun1fun2fun2fun2fun3fun3fun3fun2fun1fun1mainmain。请注意,某些实现实际上并不在堆栈上使用激活记录实例对于主要函数,如图所示。但是,可以这样做,这样可以简化实现和讨论。在这个例子中以及本章的所有其他例子中,我们假设堆栈从较低地址向较高地址增长,尽管在特定实现中,堆栈可能以相反的方向增长。
At point 1, only the activation record instances for function main and function fun1 are on the stack. When fun1 calls fun2, an instance of fun2’s activation record is created on the stack. When fun2 calls fun3, an instance of fun3’s activation record is created on the stack. When fun3’s execution ends, the instance of its activation record is removed from the stack, and the EP is used to reset the stack top pointer. Similar processes take place when functions fun2 and fun1 terminate. After the return from the call to fun1 from main, the stack has only the instance of the activation record of main. Note that some implementations do not actually use an activation record instance on the stack for main functions, such as the one shown in the figure. However, it can be done this way, and it simplifies both the implementation and our discussion. In this example and in all others in this chapter, we assume that the stack grows from lower addresses to higher addresses, although in a particular implementation, the stack may grow in the opposite direction.
在给定时间存在于堆栈中的动态链接集合称为动态链或调用链。它表示执行如何到达其当前位置的动态历史,该位置始终位于子程序代码中,其激活记录实例位于堆栈顶部。对局部变量的引用可以在代码中表示为从局部作用域的激活记录开头的偏移量,其地址存储在 EP 中。这样的偏移量称为local_offset。
The collection of dynamic links present in the stack at a given time is called the dynamic chain, or call chain. It represents the dynamic history of how execution got to its current position, which is always in the subprogram code whose activation record instance is on top of the stack. References to local variables can be represented in the code as offsets from the beginning of the activation record of the local scope, whose address is stored in the EP. Such an offset is called a local_offset.
活动记录中变量的 local_offset 可以在编译时使用与活动记录关联的子程序中声明的变量的顺序、类型和大小来确定。为了简化讨论,我们假设所有变量在活动记录中都占据一个位置。子程序中声明的第一个局部变量将在活动记录中从底部开始分配两个位置加上参数的数量(前两个位置用于返回地址和动态链接)。第二个局部声明的变量将位于更靠近堆栈顶部的一个位置,依此类推。例如,考虑前面的示例程序。在 中fun1, 的 local_offsets为 3;对于 ,t它的 local_offset 为 4。同样,在 中fun2, 的 local_offsety为 3。要获取任何局部变量的地址,请将变量的 local_offset 添加到 EP。
The local_offset of a variable in an activation record can be determined at compile time, using the order, types, and sizes of variables declared in the subprogram associated with the activation record. To simplify the discussion, we assume that all variables take one position in the activation record. The first local variable declared in a subprogram would be allocated in the activation record two positions plus the number of parameters from the bottom (the first two positions are for the return address and the dynamic link). The second local variable declared would be one position nearer the stack top and so forth. For example, consider the preceding example program. In fun1, the local_offset of s is 3; for t it is 4. Likewise, in fun2, the local_offset of y is 3. To get the address of any local variable, the local_offset of the variable is added to the EP.
考虑以下示例 C 程序,它使用递归来计算阶乘函数:
Consider the following example C program, which uses recursion to compute the factorial function:
int factorial(int n) {
<---------- 1
if (n <= 1)
return 1;
else return (n * factorial(n - 1));
<---------- 2
}
void main() {
int value;
value = factorial(3);
<---------- 3
}
int factorial(int n) {
<---------- 1
if (n <= 1)
return 1;
else return (n * factorial(n - 1));
<---------- 2
}
void main() {
int value;
value = factorial(3);
<---------- 3
}
该函数的激活记录格式如图10.6factorial所示。请注意,它有一个用于函数返回值的附加条目。
The activation record format for the function factorial is shown in Figure 10.6. Notice that it has an additional entry for the return value of the function.
factorialfactorial图 10.7显示了函数 中三次执行到达位置 1 时的堆栈内容factorial。每次都显示函数的一次以上激活,其功能值未定义。第一个激活记录实例具有调用函数的返回地址。main其他实例具有函数本身的返回地址;这些用于递归调用。
Figure 10.7 shows the contents of the stack for the three times execution reaches position 1 in the function factorial. Each shows one more activation of the function, with its functional value undefined. The first activation record instance has the return address to the calling function, main. The others have a return address to the function itself; these are for the recursive calls.
factorialfactorial图 10.8显示了函数 中执行到位置 2 的三次堆栈内容factorial。位置 2 表示return执行之后但在活动记录从堆栈中删除之前的时间。回想一下,函数的代码将参数的当前值n乘以函数递归调用返回的值。第一个返回factorial值 1。该活动的活动记录实例的参数 版本的值为 1。n乘法的结果 1 返回给 的第二个活动,factorial将其乘以 的参数值n,即 2。此步骤将值 2 返回给 的第一个活动,factorial将其乘以 的参数值,即 3,得出最终函数值 6,然后将其返回给中n的第一次调用。factorialmain
Figure 10.8 shows the stack contents for the three times that execution reaches position 2 in the function factorial. Position 2 is meant to be the time after the return is executed but before the activation record has been removed from the stack. Recall that the code for the function multiplies the current value of the parameter n by the value returned by the recursive call to the function. The first return from factorial returns the value 1. The activation record instance for that activation has a value of 1 for its version of the parameter n. The result from that multiplication, 1, is returned to the second activation of factorial to be multiplied by its parameter value for n, which is 2. This step returns the value 2 to the first activation of factorial to be multiplied by its parameter value for n, which is 3, yielding the final functional value of 6, which is then returned to the first call to factorial in main.
main和时堆栈内容factorialmain and factorial一些非基于 C 的静态作用域编程语言使用堆栈动态局部变量并允许子程序嵌套。其中包括 Fortran Ada、Python、JavaScript、Ruby 和 Swift,以及函数式语言。在本节中,我们将研究最常用的实现可嵌套子程序的方法。直到本节最后,我们都会忽略闭包。
Some of the non–C-based static-scoped programming languages use stack-dynamic local variables and allow subprograms to be nested. Among these are Fortran Ada, Python, JavaScript, Ruby, and Swift, as well as the functional languages. In this section, we examine the most commonly used approach to implementing subprograms that may be nested. Until the very end of this section, we ignore closures.
在具有嵌套子程序的静态作用域语言中,对非局部变量的引用需要两步访问过程。所有可以非局部访问的非静态变量都位于现有的活动记录实例中,因此位于堆栈中的某个位置。访问过程的第一步是在堆栈中找到分配变量的活动记录实例。第二部分是使用变量的 local_offset(在活动记录实例内)来访问它。
A reference to a nonlocal variable in a static-scoped language with nested subprograms requires a two-step access process. All nonstatic variables that can be nonlocally accessed are in existing activation record instances and therefore are somewhere in the stack. The first step of the access process is to find the instance of the activation record in the stack in which the variable was allocated. The second part is to use the local_offset of the variable (within the activation record instance) to access it.
找到正确的活动记录实例是两个步骤中更有趣和更困难的一个。首先,请注意,在给定的子程序中,只有在静态祖先作用域中声明的变量才是可见的并且可以访问。此外,当嵌套子程序引用所有静态祖先的活动记录实例中的变量时,它们的活动记录实例始终位于堆栈中。这是由静态作用域语言的静态语义规则保证的:只有当子程序的所有静态祖先子程序都处于活动状态时,子程序才可调用。1如果特定的静态祖先不处于活动状态,则其局部变量将不会绑定到存储,因此允许访问它们是无稽之谈。
Finding the correct activation record instance is the more interesting and more difficult of the two steps. First, note that in a given subprogram, only variables that are declared in static ancestor scopes are visible and can be accessed. Also, activation record instances of all of the static ancestors are always on the stack when variables in them are referenced by a nested subprogram. This is guaranteed by the static semantic rules of the static-scoped languages: A subprogram is callable only when all of its static ancestor subprograms are active.1 If a particular static ancestor were not active, its local variables would not be bound to storage, so it would be nonsense to allow access to them.
非局部引用的语义决定了正确的声明是查看封闭范围时找到的第一个声明,嵌套最紧密的声明首先出现。因此,为了支持非局部引用,必须能够找到堆栈中与这些静态祖先相对应的所有活动记录实例。这一观察结果导致了下一小节中描述的实现方法。
The semantics of nonlocal references dictates that the correct declaration is the first one found when looking through the enclosing scopes, most closely nested first. So, to support nonlocal references, it must be possible to find all of the instances of activation records in the stack that correspond to those static ancestors. This observation leads to the implementation approach described in the following subsection.
我们直到第 10.5节 才讨论块的问题,因此在本节的其余部分,所有作用域都假定由子程序定义。由于函数不能在基于 C 的语言中嵌套(这些语言中唯一的静态作用域是用块创建的函数),因此本节的讨论不直接适用于这些语言。
We do not address the issue of blocks until Section 10.5, so in the remainder of this section, all scopes are assumed to be defined by subprograms. Because functions cannot be nested in the C-based languages (the only static scope in those languages are those created with blocks), the discussions of this section do not apply to those languages directly.
在允许嵌套子程序的语言中,实现静态作用域的最常见方法是静态链接。在这种方法中,一个称为静态链接的新指针被添加到活动记录中。静态链接(有时称为静态作用域指针)指向静态父级活动的活动记录实例的底部。它用于访问非局部变量。通常,静态链接出现在活动记录中的参数下方。将静态链接添加到活动记录要求本地偏移量与不包含静态链接时不同。参数前不再有两个活动记录元素,现在有三个:返回地址、静态链接和动态链接。
The most common way to implement static scoping in languages that allow nested subprograms is static chaining. In this approach, a new pointer, called a static link, is added to the activation record. The static link, which is sometimes called a static scope pointer, points to the bottom of the activation record instance of an activation of the static parent. It is used for accesses to nonlocal variables. Typically, the static link appears in the activation record below the parameters. The addition of the static link to the activation record requires that local offsets differ from when the static link is not included. Instead of having two activation record elements before the parameters, there are now three: the return address, the static link, and the dynamic link.
静态链是连接堆栈中某些活动记录实例的静态链接链。在子程序执行期间P,其活动记录实例的静态链接指向 的P静态父程序单元的活动记录实例。该实例的静态链接依次指向 的静态祖父程序P单元的活动记录实例(如果有)。因此,静态链按静态父级在前的顺序连接正在执行的子程序的所有静态祖先。显然,此链可用于实现静态作用域语言中对非局部变量的访问。
A static chain is a chain of static links that connect certain activation record instances in the stack. During the execution of a subprogram P, the static link of its activation record instance points to an activation record instance of P’s static parent program unit. That instance’s static link points in turn to P’s static grandparent program unit’s activation record instance, if there is one. So, the static chain connects all the static ancestors of an executing subprogram, in order of static parent first. This chain can obviously be used to implement the accesses to nonlocal variables in static-scoped languages.
使用静态链接查找非局部变量的正确活动记录实例相对简单。当引用非局部变量时,可以通过搜索静态链找到包含该变量的活动记录实例,直到找到包含该变量的静态祖先活动记录实例。但是,这可能比这容易得多。由于作用域的嵌套在编译时已知,因此编译器不仅可以确定引用是非局部的,还可以确定必须遵循的静态链的长度,才能到达包含非局部对象的活动记录实例。
Finding the correct activation record instance of a nonlocal variable using static links is relatively straightforward. When a reference is made to a nonlocal variable, the activation record instance containing the variable can be found by searching the static chain until a static ancestor activation record instance is found that contains the variable. However, it can be much easier than that. Because the nesting of scopes is known at compile time, the compiler can determine not only that a reference is nonlocal but also the length of the static chain that must be followed to reach the activation record instance that contains the nonlocal object.
让static_depth成为与静态作用域关联的整数,该整数指示它在最外层作用域中的嵌套深度。未嵌套在任何其他单元内的程序单元的 static_depth 为 0。如果子程序A在非嵌套程序单元中定义,则其 static_depth 为 1。如果子程序A包含嵌套子程序的定义B,则B其 static_depth 为 2。
Let static_depth be an integer associated with a static scope that indicates how deeply it is nested in the outermost scope. A program unit that is not nested inside any other unit has a static_depth of 0. If subprogram A is defined in a nonnested program unit, its static_depth is 1. If subprogram A contains the definition of a nested subprogram B, then B’s static_depth is 2.
对于对变量的非局部引用,到达正确活动记录实例所需的静态链的长度X恰好是包含对的引用的子程序的 static_depthX与包含对的声明的子程序的 static_depth之间的差值X。此差值称为引用的nesting_depth或chain_offset。实际引用可以用有序整数对 (chain_offset, local_offset) 表示,其中 chain_offset 是指向正确活动记录实例的链接数(local_offset 在10.3.2节 中描述)。例如,考虑以下 Python 程序骨架:
The length of the static chain needed to reach the correct activation record instance for a nonlocal reference to a variable X is exactly the difference between the static_depth of the subprogram containing the reference to X and the static_depth of the subprogram containing the declaration for X. This difference is called the nesting_depth, or chain_offset, of the reference. The actual reference can be represented by an ordered pair of integers (chain_offset, local_offset), where chain_offset is the number of links to the correct activation record instance (local_offset is described in Section 10.3.2). For example, consider the following skeletal Python program:
# Global scope
. . .
def f1():
def f2():
def f3():
. . .
# end of f3
. . .
# end of f2
. .
# end of f1
# Global scope
. . .
def f1():
def f2():
def f3():
. . .
# end of f3
. . .
# end of f2
. .
# end of f1
全局范围、、和的 static_depthsf1分别f2为f30、1、2 和 3。如果过程f3引用在中声明的变量f1,则该引用的 chain_offset 将为 2(的 static_depthf3减去的 static_depth f1)。如果过程引用f3在中声明的变量f2,则该引用的 chain_offset 将为 1。对局部变量的引用可以使用相同的机制处理,chain_offset 为 0,但不是使用指向局部变量的静态指针子程序的激活记录实例,其中变量被声明为基地址,使用 EP。
The static_depths of the global scope, f1, f2, and f3 are 0, 1, 2, and 3, respectively. If procedure f3 references a variable declared in f1, the chain_offset of that reference would be 2 (static_depth of f3 minus the static_depth of f1). If procedure f3 references a variable declared in f2, the chain_offset of that reference would be 1. References to locals can be handled using the same mechanism, with a chain_offset of 0, but instead of using the static pointer to the activation record instance of the subprogram where the variable was declared as the base address, the EP is used.
为了说明非本地访问的完整过程,请考虑以下骨架 JavaScript 程序:
To illustrate the complete process of nonlocal accesses, consider the following skeletal JavaScript program:
function main(){
var x;
function bigsub() {
var a, b, c;
function sub1 {
var a, d;
...
a = b + c; <---------------------------------1
...
} // end of sub1
function sub2(x) {
var b, e;
function sub3() {
var c, e;
...
sub1();
...
e = b + a; <--------------------------------2
} // end of sub3
...
sub3();
...
a = d + e; <----------------------------------3
} // end of sub2
...
sub2(7);
...
} // end of bigsub
...
bigsub();
...
} // end of main
function main(){
var x;
function bigsub() {
var a, b, c;
function sub1 {
var a, d;
...
a = b + c; <---------------------------------1
...
} // end of sub1
function sub2(x) {
var b, e;
function sub3() {
var c, e;
...
sub1();
...
e = b + a; <--------------------------------2
} // end of sub3
...
sub3();
...
a = d + e; <----------------------------------3
} // end of sub2
...
sub2(7);
...
} // end of bigsub
...
bigsub();
...
} // end of main
过程调用的顺序是
The sequence of procedure calls is
main呼叫bigsub
bigsub呼叫sub2
sub2呼叫sub3
sub3呼叫sub1
main calls bigsub
bigsub calls sub2
sub2 calls sub3
sub3 calls sub1
The stack situation when execution first arrives at point 1 in this program is shown in Figure 10.9.
mainmain在过程 中的位置 1 处sub1,引用指向局部变量 ,a而不是a中的非局部变量bigsub。对 的引用a具有 chain_offset/local_offset 对 (0, 3)。对 的引用b指向b中的非局部变量bigsub。它可以用对 (1, 4) 表示。local_offset 为 4,因为 3 偏移量将是第一个局部变量 (bigsub没有参数)。请注意,如果使用动态链接对具有变量 声明的活动记录实例进行简单搜索,它将找到中声明的b变量,这将是不正确的。如果将 (1, 4) 对与动态链一起使用,则将使用来自的变量。但是,静态链接指向 的活动记录,它具有 的正确版本。此时的变量不在引用环境中,并且(正确地)无法访问。点 1 处的对 的引用是对中定义的,它由对 (1, 5) 表示。bsub2esub3bigsubbbsub2ccbigsub
At position 1 in procedure sub1, the reference is to the local variable, a, not to the nonlocal variable a from bigsub. This reference to a has the chain_offset/local_offset pair (0, 3). The reference to b is to the nonlocal b from bigsub. It can be represented by the pair (1, 4). The local_offset is 4, because a 3 offset would be the first local variable (bigsub has no parameters). Notice that if the dynamic link were used to do a simple search for an activation record instance with a declaration for the variable b, it would find the variable b declared in sub2, which would be incorrect. If the (1, 4) pair were used with the dynamic chain, the variable e from sub3 would be used. The static link, however, points to the activation record for bigsub, which has the correct version of b. The variable b in sub2 is not in the referencing environment at this point and is (correctly) not accessible. The reference to c at point 1 is to the c defined in bigsub, which is represented by the pair (1, 5).
sub1执行完成后, 的活动记录实例sub1从堆栈中移除,控制权返回到。对 中位置 2 的sub3变量的引用是本地的,并使用 (0, 4) 对进行访问。esub3对 变量的引用b是对 中声明的变量的引用sub2,因为这是包含此类声明的最近的静态祖先。它通过对 (1, 4) 进行访问。local_offset 为 4,因为b是在 中声明的第一个变量sub1,并且sub2有一个参数。对 变量的引用a是对a中声明的bigsub,因为sub3和它的静态父代都sub2没有名为 的变量的声明a。它通过对 (2, 3) 进行引用。
After sub1 completes its execution, the activation record instance for sub1 is removed from the stack, and control returns to sub3. The reference to the variable e at position 2 in sub3 is local and uses the pair (0, 4) for access. The reference to the variable b is to the one declared in sub2, because that is the nearest static ancestor that contains such a declaration. It is accessed with the pair (1, 4). The local_offset is 4 because b is the first variable declared in sub1, and sub2 has one parameter. The reference to the variable a is to the a declared in bigsub, because neither sub3 nor its static parent sub2 has a declaration for a variable named a. It is referenced with the pair (2, 3).
sub3完成执行后, 的活动记录实例sub3将从堆栈中移除,仅留下 、 和 的活动记录实例main。bigsub在sub2中的位置 3 处sub2,对 变量的引用a是对 中a的bigsub,该变量在活动例程中具有唯一的 声明a。此访问通过对 (1, 3) 进行。在此位置,没有可见范围包含变量 的声明d,因此对 的此引用d是静态语义错误。当编译器尝试计算 chain_offset/local_offset 对时,将检测到该错误。对 的引用是对中的e本地变量,可以使用对 (0, 5) 进行访问。esub2
After sub3 completes its execution, the activation record instance for sub3 is removed from the stack, leaving only the activation record instances for main, bigsub, and sub2. At position 3 in sub2, the reference to the variable a is to the a in bigsub, which has the only declaration of a among the active routines. This access is made with the pair (1, 3). At this position, there is no visible scope containing a declaration for the variable d, so this reference to d is a static semantics error. The error would be detected when the compiler attempted to compute the chain_offset/local_offset pair. The reference to e is to the local e in sub2, which can be accessed with the pair (0, 5).
总之,对a点 1、2 和 3 处的变量的引用将由以下点表示:
In summary, the references to the variable a at points 1, 2, and 3 would be represented by the following points:
(0,3)(本地)
(0, 3) (local)
(2,3)(两层远)
(2, 3) (two levels away)
(1,3)(下一层)
(1, 3) (one level away)
此时,有理由问一下在程序执行过程中如何维护静态链。如果维护过程过于复杂,那么简单有效就不重要了。我们在此假设作为子程序的参数未实现。
It is reasonable at this point to ask how the static chain is maintained during program execution. If its maintenance is too complex, the fact that it is simple and effective would be unimportant. We assume here that parameters that are subprograms are not implemented.
每次调用和返回子程序时都必须修改静态链。返回部分很简单:当子程序终止时,其活动记录实例将从堆栈中移除。移除后,新的顶部活动记录实例是调用刚刚终止执行的子程序的单元的实例。由于此活动记录实例的静态链从未改变,因此它可以正常工作,就像在调用另一个子程序之前一样。因此,不需要其他操作。
The static chain must be modified for each subprogram call and return. The return part is trivial: When the subprogram terminates, its activation record instance is removed from the stack. After this removal, the new top activation record instance is that of the unit that called the subprogram whose execution just terminated. Because the static chain from this activation record instance was never changed, it works correctly just as it did before the call to the other subprogram. Therefore, no other action is required.
子程序调用时所需的操作更为复杂。虽然在编译时很容易确定正确的父作用域,但在调用时必须找到父作用域的最新活动记录实例。这可以通过查看动态链上的活动记录实例,直到找到父作用域的第一个实例来完成。但是,可以通过将子程序声明和引用完全视为变量声明和引用来避免此搜索。当编译器遇到子程序调用时,除其他事项外,它会确定声明被调用子程序的子程序,该子程序必须是调用例程的静态祖先。然后,它会计算 nesting_depth,即调用者和声明被调用子程序的子程序之间的封闭作用域数。此信息被存储,并可在执行期间由子程序调用访问。在调用时,通过向下移动调用者的静态链来找到被调用子程序的活动记录实例的静态链接。数量此移动中的链接数等于 nesting_depth,它是在编译时计算的。
The action required at a subprogram call is more complex. Although the correct parent scope is easily determined at compile time, the most recent activation record instance of the parent scope must be found at the time of the call. This can be done by looking at activation record instances on the dynamic chain until the first one of the parent scope is found. However, this search can be avoided by treating subprogram declarations and references exactly like variable declarations and references. When the compiler encounters a subprogram call, among other things, it determines the subprogram that declared the called subprogram, which must be a static ancestor of the calling routine. It then computes the nesting_depth, or number of enclosing scopes between the caller and the subprogram that declared the called subprogram. This information is stored and can be accessed by the subprogram call during execution. At the time of the call, the static link of the called subprogram’s activation record instance is found by moving down the static chain of the caller. The number of links in this move is equal to the nesting_depth, which was computed at compile time.
再次考虑图10.9中所示的程序main和堆栈情况。在调用in时,编译器确定(调用者)的 nesting_depth 为在声明被调用过程 的过程内部的两级,即。在执行对in的调用时,此信息用于设置 的激活记录实例的静态链接。此静态链接被设置为指向调用者的激活记录实例的静态链中的第二个静态链接指向的激活记录实例。在本例中,调用者是,其静态链接指向其父级的激活记录实例( 的激活记录实例)。的激活记录实例的静态链接指向 的激活记录实例。因此, 的新激活记录实例的静态链接被设置为指向 的激活记录实例。 sub1sub3sub3sub1bigsubsub1sub3sub1sub3sub2sub2bigsubsub1bigsub
Consider again the program main and the stack situation shown in Figure 10.9. At the call to sub1 in sub3, the compiler determines the nesting_depth of sub3 (the caller) to be two levels inside the procedure that declared the called procedure sub1, which is bigsub. When the call to sub1 in sub3 is executed, this information is used to set the static link of the activation record instance for sub1. This static link is set to point to the activation record instance that is pointed to by the second static link in the static chain from the caller’s activation record instance. In this case, the caller is sub3, whose static link points to its parent’s activation record instance (that of sub2). The static link of the activation record instance for sub2 points to the activation record instance for bigsub. So, the static link for the new activation record instance for sub1 is set to point to the activation record instance for bigsub.
该方法适用于所有子程序链接,除非涉及子程序的参数。
This method works for all subprogram linkage, except when parameters that are subprograms are involved.
对使用静态链方法访问非局部变量的一个批评是,对静态父级以外范围内变量的引用比对局部变量的引用成本更高。必须遵循静态链,从引用到声明,每个封闭范围都有一个链接。幸运的是,在实践中,对远距离非局部变量的引用很少见,所以这不是一个严重的问题。对静态链方法的另一个批评是,对于开发时间关键型程序的程序员来说,很难估计非局部引用的成本,因为每个引用的成本取决于引用和声明范围之间的嵌套深度。使这个问题进一步复杂化的是,后续代码修改可能会改变嵌套深度,从而改变某些引用的时间,既在更改的代码中,也可能在远离更改的代码中。
One criticism of using the static chain approach to access nonlocal variables is that references to variables in scopes beyond the static parent cost more than references to locals. The static chain must be followed, one link per enclosing scope from the reference to the declaration. Fortunately, in practice, references to distant nonlocal variables are rare, so this is not a serious problem. Another criticism of the static-chain approach is that it is difficult for a programmer working on a time-critical program to estimate the costs of nonlocal references, because the cost of each reference depends on the depth of nesting between the reference and the scope of declaration. Further complicating this problem is that subsequent code modifications may change nesting depths, thereby changing the timing of some references, both in the changed code and possibly in code far from the changes.
已经开发出一些静态链的替代方案,最引人注目的是使用称为display 的辅助数据结构的方法。但是,没有发现任何替代方案优于静态链方法,静态链方法仍然是最广泛使用的方法。因此,这里不讨论任何替代方案。
Some alternatives to static chains have been developed, most notably an approach that uses an auxiliary data structure called a display. However, none of the alternatives has been found to be superior to the static-chain method, which is still the most widely used approach. Therefore, none of the alternatives are discussed here.
本节中描述的流程和数据结构正确地实现了那些不允许函数返回函数并且不允许将函数赋值给变量的语言中的闭包。但是,对于允许其中一个或两个操作的语言来说,闭包是不够的。需要几种新的机制来实现对此类语言中的非局部变量的访问。首先,如果子程序从嵌套但不是全局作用域访问变量,则该变量不能仅存储在其主作用域的活动记录中。可以在需要它的子程序被激活之前释放该活动记录。此类变量也可以存储在堆中并被无限扩展(它们的生命周期是整个程序的生命周期)。其次,子程序必须具有访问存储在堆中的非局部变量的机制。第三,每次更新堆栈版本时,都必须更新非局部访问的堆分配变量。显然,这些是使用静态链实现静态作用域的不平凡的扩展。
The processes and data structures described in this section correctly implement closures in languages that do not permit functions to return functions and do not allow functions to be assigned to variables. However, they are inadequate for languages that do allow one or both of those operations. Several new mechanisms are needed to implement access to nonlocals in such languages. First, if a subprogram accesses a variable from a nesting but not global scope, that variable cannot be stored only in the activation record of its home scope. That activation record could be deallocated before the subprogram that needs it is activated. Such variables could also be stored in the heap and given unlimited extend (their lifetimes are the lifetime of the whole program). Second, subprograms must have mechanisms to access the nonlocals that are stored in the heap. Third, the heap-allocated variables that are nonlocally accessed must be updated every time their stack versions are updated. Clearly, these are nontrivial extensions to the implementation static scoping using static chains.
回想一下第5章 ,许多语言(包括基于 C 的语言)都为变量提供了用户指定的局部作用域,称为块。作为块的示例,请考虑以下代码段:
Recall from Chapter 5, that a number of languages, including the C-based languages, provide for user-specified local scopes for variables called blocks. As an example of a block, consider the following code segment:
{ int temp;
temp = list[upper];
list[upper] = list[lower];
list[lower] = temp;
}
{ int temp;
temp = list[upper];
list[upper] = list[lower];
list[lower] = temp;
}
在基于 C 的语言中,块被指定为以一个或多个数据定义开头的复合语句。temp前一个块中的变量的生命周期从控制进入块时开始,到控制退出块时结束。使用这种局部变量的优点是它不会干扰在程序其他地方声明的任何其他同名变量,更具体地说,不会干扰在块的引用环境中声明的同名变量。
A block is specified in the C-based languages as a compound statement that begins with one or more data definitions. The lifetime of the variable temp in the preceding block begins when control enters the block and ends when control exits the block. The advantage of using such a local is that it cannot interfere with any other variable with the same name that is declared elsewhere in the program, or more specifically, in the referencing environment of the block.
可以使用第10.4节 中描述的用于实现嵌套子程序的静态链过程来实现块。块被视为无参数子程序,始终从程序中的同一位置调用。因此,每个块都有一个活动记录。每次执行块时都会创建其活动记录的一个实例。
Blocks can be implemented by using the static-chain process described in Section 10.4 for implementing nested subprograms. Blocks are treated as parameterless subprograms that are always called from the same place in the program. Therefore, every block has an activation record. An instance of its activation record is created every time the block is executed.
块也可以用另一种更简单、更高效的方式实现。在程序执行期间,块变量所需的最大存储量可以静态确定,因为块是严格按照文本顺序进入和退出的。可以在活动记录中的局部变量之后分配此空间量。所有块变量的偏移量都可以静态计算,因此可以像对局部变量一样对块变量进行寻址。
Blocks can also be implemented in a different and somewhat simpler and more efficient way. The maximum amount of storage required for block variables at any time during the execution of a program can be statically determined, because blocks are entered and exited in strictly textual order. This amount of space can be allocated after the local variables in the activation record. Offsets for all block variables can be statically computed, so block variables can be addressed exactly as if they were local variables.
例如,考虑以下骨架程序:
For example, consider the following skeletal program:
void main() {
int x, y, z;
while ( . . ) {
int a, b, c;
.
while ( . . . ) {
int d, e;
.
}
}
while ( . . . ) {
int f, g;
. . .
}
. .
}
void main() {
int x, y, z;
while ( . . ) {
int a, b, c;
.
while ( . . . ) {
int d, e;
.
}
}
while ( . . . ) {
int f, g;
. . .
}
. .
}
对于此程序,可以使用图 10.10中所示的静态内存布局。请注意,f和与和g占用相同的内存位置,因为和在退出其块时(在和分配之前)从堆栈中弹出。ababfg
For this program, the static-memory layout shown in Figure 10.10 could be used. Note that f and g occupy the same memory locations as a and b, because a and b are popped off the stack when their block is exited (before f and g are allocated).
在动态作用域语言中,至少有两种不同的方式可以实现局部变量及其非局部引用:深度访问和浅层访问。请注意,深度访问和浅层访问不是与深层和浅层绑定相关的概念。绑定和访问之间的一个重要区别是,深层和浅层绑定会导致不同的语义,而深层和浅层访问则不会。
There are at least two distinct ways in which local variables and nonlocal references to them can be implemented in a dynamic-scoped language: deep access and shallow access. Note that deep access and shallow access are not concepts related to deep and shallow binding. An important difference between binding and access is that deep and shallow bindings result in different semantics; deep and shallow accesses do not.
如果局部变量是堆栈动态的,并且是动态作用域语言中活动记录的一部分,则对非局部变量的引用可以通过搜索当前活动的其他子程序的活动记录实例来解析,从最近激活的开始。此概念类似于在具有嵌套子程序的静态作用域语言中访问非局部变量的概念,只是遵循的是动态链(而不是静态链)。动态链将所有子程序活动记录实例按照激活顺序的相反顺序链接在一起。因此,动态链正是在动态作用域语言中引用非局部变量所需要的。此方法称为深度访问,因为访问可能需要深入堆栈进行搜索。
If local variables are stack dynamic and are part of the activation records in a dynamic-scoped language, references to nonlocal variables can be resolved by searching through the activation record instances of the other subprograms that are currently active, beginning with the one most recently activated. This concept is similar to that of accessing nonlocal variables in a static-scoped language with nested subprograms, except that the dynamic—rather than the static—chain is followed. The dynamic chain links together all subprogram activation record instances in the reverse of the order in which they were activated. Therefore, the dynamic chain is exactly what is needed to reference nonlocal variables in a dynamic-scoped language. This method is called deep access, because access may require searches deep into the stack.
考虑以下示例骨架程序:
Consider the following example skeletal program:
void sub3() {
int x, z;
x = u + v;
. . .
}
void sub2() {
int w, x;
. . .
}
void sub1() {
int v, w;
. . .
}
void main() {
int v, u;
. . .
}
void sub3() {
int x, z;
x = u + v;
. . .
}
void sub2() {
int w, x;
. . .
}
void sub1() {
int v, w;
. . .
}
void main() {
int v, u;
. . .
}
此程序的语法使其看起来像是基于 C 语言的程序,但它并不代表任何特定语言。假设发生以下函数调用序列:
This program is written in a syntax that gives it the appearance of a program in a C-based language, but it is not meant to be in any particular language. Suppose the following sequence of function calls occurs:
main调用 sub1
main calls sub1
sub1调用 sub1
sub1 calls sub1
sub1调用 sub2
sub1 calls sub2
sub2调用 sub3
sub2 calls sub3
图 10.11显示了此调用序列之后函数执行期间的堆栈sub3。请注意,活动记录实例没有静态链接,这在动态作用域语言中毫无用处。
Figure 10.11 shows the stack during the execution of function sub3 after this calling sequence. Notice that the activation record instances do not have static links, which would serve no purpose in a dynamic-scoped language.
考虑函数 中对变量 、 和 的引用x。u对的引用是在 的活动记录实例中找到的。通过搜索堆栈上的所有活动记录实例可以找到 对 的引用,因为 中存在的唯一具有该名称的变量。 此搜索涉及跟踪四个动态链接并检查 10 个变量名称。 对 的引用是在子程序 的最新(动态链上最近的)活动记录实例中找到的。vsub3xsub3umainvsub1
Consider the references to the variables x, u, and v in function sub3. The reference to x is found in the activation record instance for sub3. The reference to u is found by searching all of the activation record instances on the stack, because the only existing variable with that name is in main. This search involves following four dynamic links and examining 10 variable names. The reference to v is found in the most recent (nearest on the dynamic chain) activation record instance for the subprogram sub1.
动态作用域语言中非局部访问的深度访问方法与静态作用域语言的静态链方法之间存在两个重要区别。首先,在动态作用域语言中,无法在编译时确定必须搜索的链的长度。必须搜索链中的每个活动记录实例,直到找到变量的第一个实例。这是动态作用域语言的执行速度通常比静态作用域语言慢的原因之一。其次,活动记录必须存储搜索过程中的变量名称,而在静态作用域语言实现中,仅需要值。(静态作用域不需要名称,因为所有变量都由 chain_offset/local_offset 对表示。)
There are two important differences between the deep-access method for nonlocal access in a dynamic-scoped language and the static-chain method for static-scoped languages. First, in a dynamic-scoped language, there is no way to determine at compile time the length of the chain that must be searched. Every activation record instance in the chain must be searched until the first instance of the variable is found. This is one reason why dynamic-scoped languages typically have slower execution speeds than static-scoped languages. Second, activation records must store the names of variables for the search process, whereas in static-scoped language implementations only the values are required. (Names are not required for static scoping, because all variables are represented by the chain_offset/local_offset pairs.)
浅访问是一种替代的实现方法,而不是替代的语义。如前所述,深度访问和浅访问的语义是相同的。在浅访问方法中,在子程序中声明的变量不存储在这些子程序的活动记录中。因为动态作用域最多只有一个特定变量的可见版本名称,可以采取一种非常不同的方法。浅访问的一种变体是在完整程序中为每个变量名设置一个单独的堆栈。每次在被调用的子程序开头声明创建具有特定名称的新变量时,都会在堆栈顶部为该变量分配一个与其名称对应的单元。对名称的每个引用都是对与该名称关联的堆栈顶部的变量的引用,因为顶部变量是最近创建的。当子程序终止时,其局部变量的生命周期结束,并且这些变量名的堆栈被弹出。这种方法允许快速引用变量,但在子程序的入口和出口维护堆栈的代价很高。
Shallow access is an alternative implementation method, not an alternative semantics. As stated previously, the semantics of deep access and shallow access are identical. In the shallow-access method, variables declared in subprograms are not stored in the activation records of those subprograms. Because with dynamic scoping there is at most one visible version of a variable of any specific name at a given time, a very different approach can be taken. One variation of shallow access is to have a separate stack for each variable name in a complete program. Every time a new variable with a particular name is created by a declaration at the beginning of a subprogram that has been called, the variable is given a cell at the top of the stack for its name. Every reference to the name is to the variable on top of the stack associated with that name, because the top one is the most recently created. When a subprogram terminates, the lifetimes of its local variables end, and the stacks for those variable names are popped. This method allows fast references to variables, but maintaining the stacks at the entrances and exits of subprograms is costly.
图 10.12显示了先前示例程序中的变量堆栈,其情况与图 10.11中的堆栈相同。
Figure 10.12 shows the variable stacks for the earlier example program in the same situation as shown with the stack in Figure 10.11.
实现浅层访问的另一种选择是使用一个中央表,该表为程序中的每个不同变量名称提供了一个位置。除了每个条目外,还会维护一个称为active 的位,该位指示该名称是否具有当前绑定或变量关联。然后,对任何变量的任何访问都可以是中央表中的偏移量。偏移量是静态的,因此访问速度可以很快。SNOBOL 实现使用中央表实现技术。
Another option for implementing shallow access is to use a central table that has a location for each different variable name in a program. Along with each entry, a bit called active is maintained that indicates whether the name has a current binding or variable association. Any access to any variable can then be to an offset into the central table. The offset is static, so the access can be fast. SNOBOL implementations use the central table implementation technique.
中央表的维护很简单。子程序调用要求其所有局部变量都逻辑地放置在中央表中。如果中央表中新变量的位置已经处于活动状态(即,如果它包含一个生命周期尚未结束的变量(由活动位指示)),则必须在新变量的生命周期内将该值保存在某个位置。每当变量开始其生命周期时,都必须设置其中央表位置中的活动位。
Maintenance of a central table is straightforward. A subprogram call requires that all of its local variables be logically placed in the central table. If the position of the new variable in the central table is already active—that is, if it contains a variable whose lifetime has not yet ended (which is indicated by the active bit)—that value must be saved somewhere during the lifetime of the new variable. Whenever a variable begins its lifetime, the active bit in its central table position must be set.
在中央表的设计以及临时替换值时存储值的方式方面,存在多种变化。一种变化是使用一个“隐藏”堆栈,所有保存的对象都存储在该堆栈中。由于子程序调用和返回(因此局部变量的生命周期)是嵌套的,因此这种方法效果很好。
There have been several variations in the design of the central table and in the way values are stored when they are temporarily replaced. One variation is to have a “hidden” stack on which all saved objects are stored. Because subprogram calls and returns, and thus the lifetimes of local variables, are nested, this works well.
第二种变体可能是最简洁且实现成本最低的变体。使用单个单元格的中央表,仅存储具有唯一名称的每个变量的当前版本。替换的变量存储在创建替换变量的子程序的激活记录中。这是一种堆栈机制,但它使用已经存在的堆栈,因此新的开销最小。
The second variation is perhaps the cleanest and least expensive to implement. A central table of single cells is used, storing only the current version of each variable with a unique name. Replaced variables are stored in the activation record of the subprogram that created the replacement variable. This is a stack mechanism, but it uses the stack that already exists, so the new overhead is minimal.
对非局部变量进行浅层和深层访问的选择取决于子程序调用和非局部引用的相对频率。深层访问方法提供了快速的子程序链接,但对非局部变量的引用,尤其是对远距离非本地变量的引用(就调用链而言),成本较高。浅访问方法提供了对非本地变量(尤其是远距离非本地变量)的更快引用,但在子程序链接方面成本较高。
The choice between shallow and deep access to nonlocal variables depends on the relative frequencies of subprogram calls and nonlocal references. The deep-access method provides fast subprogram linkage, but references to nonlocals, especially references to distant nonlocals (in terms of the call chain), are costly. The shallow-access method provides much faster references to nonlocals, especially distant nonlocals, but is more costly in terms of subprogram linkage.
子程序链接语义要求实现者采取许多操作。对于“简单”子程序,这些操作相对简单。在调用时,必须保存执行状态,必须将参数和返回地址传递给被调用的子程序,并且必须转移控制权。在返回时,必须将按结果传递和按值传递结果参数的值以及返回值(如果是函数)传回,必须恢复执行状态,并将控制权传回给调用者。在具有堆栈动态局部变量和嵌套子程序的语言中,子程序链接更为复杂。可能有多个活动记录实例,这些实例必须存储在运行时堆栈中,并且必须在活动记录实例中维护静态和动态链接。静态链接是为了允许在静态作用域语言中引用非局部变量。
Subprogram linkage semantics requires many actions by the implementation. In the case of “simple” subprograms, these actions are relatively uncomplicated. At the call, the status of execution must be saved, parameters and the return address must be passed to the called subprogram, and control must be transferred. At the return, the values of pass-by-result and pass-by-value-result parameters must be transferred back, as well as the return value if it is a function, execution status must be restored, and control transferred back to the caller. In languages with stack-dynamic local variables and nested subprograms, subprogram linkage is more complex. There may be more than one activation record instance, those instances must be stored on the run-time stack, and static and dynamic links must be maintained in the activation record instances. The static link is to allow references to nonlocal variables in static-scoped languages.
具有堆栈动态局部变量和嵌套子程序的语言中的子程序有两个组成部分:实际代码(静态)和活动记录(堆栈动态)。活动记录实例包含形式参数和局部变量等。对非局部变量的访问是通过静态父指针链实现的。
Subprograms in languages with stack-dynamic local variables and nested subprograms have two components: the actual code, which is static, and the activation record, which is stack dynamic. Activation record instances contain the formal parameters and local variables, among other things. Access to nonlocal variables is implemented with a chain of static parent pointers.
在动态作用域语言中,访问非局部变量可以通过使用动态链或通过某种中央变量表方法来实现。动态链提供较慢的访问速度,但调用和返回速度较快。中央表方法提供较快的访问速度,但调用和返回速度较慢。
Access to nonlocal variables in a dynamic-scoped language can be implemented by use of the dynamic chain or through some central variable table method. Dynamic chains provide slow accesses but fast calls and returns. The central table methods provide fast accesses but slow calls and returns.
本章中“简单”子程序的定义是什么?
What is the definition used in this chapter for “simple” subprograms?
调用者或者被调用者哪个保存了执行状态信息?
Which of the caller or callee saves execution status information?
为了链接到子程序必须存储什么?
What must be stored for the linkage to a subprogram?
链接器的任务是什么?
What is the task of a linker?
实现具有堆栈动态局部变量的子程序比实现简单子程序更困难的两个原因是什么?
What are the two reasons why implementing subprograms with stack-dynamic local variables is more difficult than implementing simple subprograms?
激活记录和激活记录实例有什么区别?
What is the difference between an activation record and an activation record instance?
为什么返回地址,动态链接和参数放在激活记录的底部?
Why are the return address, dynamic link, and parameters placed in the bottom of the activation record?
哪些机器经常使用寄存器来传递参数?
What kind of machines often use registers to pass parameters?
在具有堆栈动态局部变量和嵌套子程序的静态作用域语言中定位非局部变量的两个步骤是什么?
What are the two steps in locating a nonlocal variable in a static-scoped language with stack-dynamic local variables and nested subprograms?
定义静态链、static_depth、nesting_depth和chain_offset。
Define static chain, static_depth, nesting_depth, and chain_offset.
什么是 EP,其用途是什么?
What is an EP, and what is its purpose?
静态链方法中对变量的引用是如何表示的?
How are references to variables represented in the static-chain method?
说出三种不允许嵌套子程序的广泛使用的编程语言。
Name three widely used programming languages that do not allow nested subprograms.
静态链方法有哪两个潜在问题?
What are the two potential problems with the static-chain method?
解释实现块的两种方法。
Explain the two methods of implementing blocks.
描述实现动态作用域的深度访问方法。
Describe the deep-access method of implementing dynamic scoping.
描述实现动态作用域的浅访问方法。
Describe the shallow-access method of implementing dynamic scoping.
动态范围语言中非本地访问的深度访问方法和静态范围语言的静态链方法之间有哪两个区别?
What are the two differences between the deep-access method for nonlocal access in dynamic-scoped languages and the static-chain method for static-scoped languages?
从调用和非本地访问两个方面比较深度访问方法与浅层访问方法的效率。
Compare the efficiency of the deep-access method to that of the shallow-access method, in terms of both calls and nonlocal accesses.
当执行到达以下骨架程序中的位置 1 时,显示包含所有活动记录实例(包括静态和动态链)的堆栈。假设bigsub位于级别 1。
function bigsub() {
function a() {
function b() {
... <----------------------------1
} // end of b
function c() {
...
b();
...
} // end of c
...
c();
...
} // end of a
...
a();
...
} // end of bigsub
Show the stack with all activation record instances, including static and dynamic chains, when execution reaches position 1 in the following skeletal program. Assume bigsub is at level 1.
function bigsub() {
function a() {
function b() {
... <----------------------------1
} // end of b
function c() {
...
b();
...
} // end of c
...
c();
...
} // end of a
...
a();
...
} // end of bigsub
当执行到达以下骨架程序中的位置 1 时,显示包含所有活动记录实例(包括静态和动态链)的堆栈。假设bigsub位于级别 1。
function bigsub() {
var mysum;
function a() {
var x;
function b(sum) {
var y, z;
...
c(z);
...
} // end of b
...
b(x);
...
} // end of a
function c(plums) {
... <-----------------------------1
} // end of c
var l;
...
a();
...
} // end of bigsubShow the stack with all activation record instances, including static and dynamic chains, when execution reaches position 1 in the following skeletal program. Assume bigsub is at level 1.
function bigsub() {
var mysum;
function a() {
var x;
function b(sum) {
var y, z;
...
c(z);
...
} // end of b
...
b(x);
...
} // end of a
function c(plums) {
... <-----------------------------1
} // end of c
var l;
...
a();
...
} // end of bigsub当执行到达以下骨架程序中的位置 1 时,显示包含所有活动记录实例(包括静态和动态链)的堆栈。假设bigsub位于级别 1。
function bigsub() {
function a(flag) {
function b() {
...
a(false);
...
} // end of b
...
if (flag)
b();
else c();
...
} // end of a
function c() {
function d() {
... <------------------------1
} // end of d
...
d();
...
} // end of c
...
a(true);
...
} // end of bigsub
该程序执行的调用顺序d是
bigsub 调用 a
a 调用 b
b 调用 a
a 调用 c
c 调用 d
Show the stack with all activation record instances, including static and dynamic chains, when execution reaches position 1 in the following skeletal program. Assume bigsub is at level 1.
function bigsub() {
function a(flag) {
function b() {
...
a(false);
...
} // end of b
...
if (flag)
b();
else c();
...
} // end of a
function c() {
function d() {
... <------------------------1
} // end of d
...
d();
...
} // end of c
...
a(true);
...
} // end of bigsub
The calling sequence for this program for execution to reach d is
bigsub calls a
a calls b
b calls a
a calls c
c calls d
当执行到达以下骨架程序中的位置 1 时,显示包含所有活动记录实例(包括静态和动态链)的堆栈。此程序使用深度访问方法实现动态作用域。
void fun1() {
float a;
. . .
}
void fun2() {
int b, c;
. . .
}
void fun3() {
float d;
. . . <--------- 1
}
void main() {
char e, f, g;
. . .
}
该程序执行的调用顺序fun3是
main调用 fun2
fun2调用 fun1
fun1调用 fun1
fun1调用 fun3
Show the stack with all activation record instances, including static and dynamic chains, when execution reaches position 1 in the following skeletal program. This program uses the deep-access method to implement dynamic scoping.
void fun1() {
float a;
. . .
}
void fun2() {
int b, c;
. . .
}
void fun3() {
float d;
. . . <--------- 1
}
void main() {
char e, f, g;
. . .
}
The calling sequence for this program for execution to reach fun3 is
main calls fun2
fun2 calls fun1
fun1 calls fun1
fun1 calls fun3
假设问题 4 中的程序是使用浅访问方法实现的,每个变量名都有一个堆栈。显示执行时的堆栈fun3,假设执行通过问题 4 中显示的调用序列到达该点。
Assume that the program of Problem 4 is implemented using the shallow-access method using a stack for each variable name. Show the stacks for the time of the execution of fun3, assuming execution found its way to that point through the sequence of calls shown in Problem 4.
虽然 Java 方法中的局部变量是在每次激活开始时动态分配的,但是在什么情况下特定激活中的局部变量的值可以保留前一次激活的值?
Although local variables in Java methods are dynamically allocated at the beginning of each activation, under what circumstances could the value of a local variable in a particular activation retain the value of the previous activation?
本章指出,当使用动态链在动态作用域语言中访问非局部变量时,变量名必须与值一起存储在活动记录中。如果真的这样做,每次非局部访问都需要对名称进行一系列昂贵的字符串比较。设计一种更快的字符串比较替代方案。
It is stated in this chapter that when nonlocal variables are accessed in a dynamic-scoped language using the dynamic chain, variable names must be stored in the activation records with the values. If this were actually done, every nonlocal access would require a sequence of costly string comparisons on names. Design an alternative to these string comparisons that would be faster.
Pascal allows gotos with nonlocal targets. How could such statements be handled if static chains were used for nonlocal variable access? Hint: Consider the way the correct activation record instance of the static parent of a newly enacted procedure is found (see Section 10.4.2).
静态链方法可以稍微扩展一下,在每个活动记录实例中使用两个静态链接,其中第二个指向静态祖父活动记录实例。这种方法会如何影响子程序链接和非本地引用所需的时间?
The static-chain method could be expanded slightly by using two static links in each activation record instance where the second points to the static grandparent activation record instance. How would this approach affect the time required for subprogram linkage and nonlocal references?
设计一个骨架程序和一个调用序列,产生一个激活记录实例,其中静态和动态链接指向运行时堆栈中不同的激活记录实例。
Design a skeletal program and a calling sequence that results in an activation record instance in which the static and dynamic links point to different activation-recorded instances in the run-time stack.
如果编译器使用静态链方法来实现块,那么子程序的激活记录中的哪些条目是块的激活记录中需要的?
If a compiler uses the static chain approach to implementing blocks, which of the entries in the activation records for subprograms are needed in the activation records for blocks?
检查三种不同体系结构的子程序调用指令,包括至少一种 CISC 计算机和一种 RISC 计算机,并简短地比较它们的功能。(这些指令的设计通常至少决定了编译器编写者对子程序链接的设计的一部分。)
Examine the subprogram call instructions of three different architectures, including at least one CISC machine and one RISC machine, and write a short comparison of their capabilities. (The design of these instructions usually determines at least part of the compiler writer’s design of subprogram linkage.)
编写一个包含两个子程序的程序,一个子程序接受一个参数并对该参数执行一些简单操作,另一个子程序接受 20 个参数并使用所有参数,但仅用于一个简单的操作。主程序必须多次调用这两个子程序。在程序计时代码中包含输出对两个子程序的调用的运行时间。在 RISC 计算机和 CISC 计算机上运行该程序,并比较两个子程序所需时间的比率。根据结果,您能说出这两台机器上参数传递的速度如何?
Write a program that includes two subprograms, one that takes a single parameter and performs some simple operation on that parameter and one that takes 20 parameters and uses all of the parameters, but only for one simple operation. The main program must call these two subprograms a large number of times. Include in the program timing code to output the run time of the calls to each of the two subprograms. Run the program on a RISC machine and on a CISC machine and compare the ratios of the time required by the two subprograms. Based on the results, what can you say about the speed of parameter passing on the two machines?
本章将探讨支持数据抽象的编程语言结构。在过去 50 年编程方法和编程语言设计的新思想中,数据抽象是最深刻的思想之一。
In this chapter, we explore programming language constructs that support data abstraction. Among the new ideas of the last 50 years in programming methodologies and programming language design, data abstraction is one of the most profound.
我们首先讨论编程和编程语言中抽象的一般概念。然后定义数据抽象并用一个例子来说明。本主题之后是 C++、Java、C# 和 Ruby 对数据抽象支持的描述。为了阐明支持数据抽象的语言功能在设计上的异同,我们在 C++、Java 和 Ruby 中给出了相同示例数据抽象的实现。接下来,讨论 C++、Java 5.0 和 C# 2005 构建参数化抽象数据类型的能力。
We begin by discussing the general concept of abstraction in programming and programming languages. Data abstraction is then defined and illustrated with an example. This topic is followed by descriptions of the support for data abstraction in C++, Java, C#, and Ruby. To illuminate the similarities and differences in the design of the language facilities that support data abstraction, implementations of the same example data abstraction are given in C++, Java, and Ruby. Next, the capabilities of C++, Java 5.0, and C# 2005 to build parameterized abstract data types are discussed.
本章中用来说明抽象数据类型的概念和结构的所有语言都支持面向对象编程,因为几乎所有现代语言都支持面向对象编程,而几乎所有不支持抽象数据类型的语言都已逐渐消失。
All the languages used in this chapter to illustrate the concepts and constructs of abstract data types support object-oriented programming, because virtually all contemporary languages support object-oriented programming and nearly all of those that do not, and yet support abstract data types, have faded into obscurity.
支持抽象数据类型的构造是对该类型对象的数据和操作的封装。构建大型程序需要包含多种类型的封装。本章还讨论了这些封装和相关的命名空间问题。
Constructs that support abstract data types are encapsulations of the data and operations on objects of the type. Encapsulations that contain multiple types are required for the construction of larger programs. These encapsulations and the associated namespace issues are also discussed in this chapter.
一些编程语言支持逻辑封装(与物理封装相反),逻辑封装实际上用于封装名称。这些内容在第11.7节 中讨论。
Some programming languages support logical, as opposed to physical, encapsulations, which are actually used to encapsulate names. These are discussed in Section 11.7.
抽象是实体的视图或表示,仅包含最重要的属性。从一般意义上讲,抽象允许将实体实例收集到组中,而无需考虑它们的共同属性。例如,假设我们将鸟类定义为具有以下属性的生物:两个翅膀、两条腿、一条尾巴和羽毛。那么,如果我们说乌鸦是一种鸟,那么对乌鸦的描述就不需要包括这些属性。知更鸟、麻雀和黄腹吸汁啄木鸟也是如此。特定鸟类描述中的共同属性可以抽象出来,因为所有物种都有这些属性。在特定物种中,只需要考虑区分该物种的属性。例如,乌鸦具有黑色、特定尺寸和吵闹的属性。乌鸦的描述需要提供这些属性,但不需要提供所有鸟类共有的其他属性。这大大简化了物种成员的描述。当需要了解更高层次的细节而不仅仅是特殊属性时,可以考虑对一个物种(例如鸟类)采用不太抽象的视角。
An abstraction is a view or representation of an entity that includes only the most significant attributes. In a general sense, abstraction allows one to collect instances of entities into groups in which their common attributes need not be considered. For example, suppose we define birds to be creatures with the following attributes: two wings, two legs, a tail, and feathers. Then, if we say a crow is a bird, a description of a crow need not include those attributes. The same is true for robins, sparrows, and yellow-bellied sapsuckers. The common attributes in the descriptions of specific species of birds can be abstracted away, because all species have them. Within a particular species, only the attributes that distinguish that species need be considered. For example, crows have the attributes of being black, being of a particular size, and being noisy. A description of a crow needs to provide those attributes, but not the others that are common to all birds. This results in significant simplification of the descriptions of members of the species. A less abstract view of a species, that of a bird, may be considered when it is necessary to see a higher level of detail, rather than just the special attributes.
在编程语言的世界里,抽象是对抗编程复杂性的武器,其目的是简化编程过程。它是一种有效的武器,因为它允许程序员专注于基本属性,而忽略次要属性。
In the world of programming languages, abstraction is a weapon against the complexity of programming; its purpose is to simplify the programming process. It is an effective weapon because it allows programmers to focus on essential attributes, while ignoring subordinate attributes.
当代编程语言中的两种基本抽象是流程抽象和数据抽象。
The two fundamental kinds of abstraction in contemporary programming languages are process abstraction and data abstraction.
进程抽象的概念是编程语言设计中最古老的概念之一(Plankalkül 在 20 世纪 40 年代支持进程抽象)。所有子程序都是进程抽象,因为它们为程序提供了一种指定进程的方法,而不提供其如何执行任务的细节(至少在调用程序中)。例如,当程序需要对某种类型的数字数据数组进行排序时,它通常会使用子程序进行排序。在需要排序过程的地方,语句如下
The concept of process abstraction is among the oldest in programming language design (Plankalkül supported process abstraction in the 1940s). All subprograms are process abstractions because they provide a way for a program to specify a process, without providing the details of how it performs its task (at least in the calling program). For example, when a program needs to sort an array of numeric data of some type, it usually uses a subprogram for the sorting process. At the point where the sorting process is required, a statement such as
sortInt(list, listLen)sortInt(list, listLen)
放在程序中。此调用是实际排序过程的抽象,其算法未指定。此调用与被调用子程序中实现的算法无关。
is placed in the program. This call is an abstraction of the actual sorting process, whose algorithm is not specified. The call is independent of the algorithm implemented in the called subprogram.
对于子程序sortInt,唯一必要的属性是要排序的数组的名称、其元素的类型、数组的长度以及对 的调用sortInt将导致数组排序的事实。sortInt实现的特定算法对用户来说不是必需的属性。用户只需查看排序子程序的名称和协议即可使用它。
In the case of the subprogram sortInt, the only essential attributes are the name of the array to be sorted, the type of its elements, the array’s length, and the fact that the call to sortInt will result in the array being sorted. The particular algorithm that sortInt implements is an attribute that is not essential to the user. The user needs to see only the name and protocol of the sorting subprogram to be able to use it.
数据抽象的广泛使用必然遵循过程抽象的广泛使用,因为每个数据抽象不可或缺的部分是它的操作,这些操作被定义为过程抽象。
The widespread use of data abstraction necessarily followed that of process abstraction because an integral and essential part of every data abstraction is its operations, which are defined as process abstractions.
数据抽象的演变始于 1960 年 COBOL 的第一个版本,其中包含记录数据结构。1基于C 的语言具有结构,它们也是记录。抽象数据类型是一种数据结构,采用记录的形式,但包含操作其数据的子程序。
The evolution of data abstraction began in 1960 with the first version of COBOL, which included the record data structure.1 The C-based languages have structs, which are also records. An abstract data type is a data structure, in the form of a record, but which includes subprograms that manipulate its data.
从语法上讲,抽象数据类型是一个外壳,它只包含一种特定数据类型的数据表示和为该类型提供操作的子程序。通过访问控制,可以对使用该类型的外壳外部的单元隐藏该类型的不必要细节。使用抽象数据类型的程序单元可以声明该类型的变量,即使实际表示对它们隐藏。抽象数据类型的实例称为对象。
Syntactically, an abstract data type is an enclosure that includes only the data representation of one specific data type and the subprograms that provide the operations for that type. Through access controls, unnecessary details of the type can be hidden from units outside the enclosure that use the type. Program units that use an abstract data type can declare variables of that type, even though the actual representation is hidden from them. An instance of an abstract data type is called an object.
数据抽象的动机之一与过程抽象的动机相似。它是对抗复杂性的武器;是使大型和/或复杂程序更易于管理的手段。本节后面将讨论抽象数据类型的其他动机和优势。
One of the motivations for data abstraction is similar to that of process abstraction. It is a weapon against complexity; a means of making large and/or complicated programs more manageable. Other motivations for and advantages of abstract data types are discussed later in this section.
第12章 介绍的面向对象编程是软件开发中使用数据抽象的产物,数据抽象是其基本组成部分之一。
Object-oriented programming, which is described in Chapter 12, is an outgrowth of the use of data abstraction in software development, and data abstraction is one of its fundamental components.
抽象数据类型的概念,至少就内置类型而言,并不是最近才出现的。所有内置数据类型,甚至是 Fortran I 的内置数据类型,都是抽象数据类型,尽管它们很少被这样称呼。例如,考虑浮点数据类型。大多数编程语言至少包含其中一种。浮点类型提供了创建变量来存储浮点数据的方法,还提供了一组用于操作该类型对象的算术运算。
The concept of an abstract data type, at least in terms of built-in types, is not a recent development. All built-in data types, even those of Fortran I, are abstract data types, although they are rarely called that. For example, consider a floating-point data type. Most programming languages include at least one of these. A floating-point type provides the means to create variables to store floating-point data and also provides a set of arithmetic operations for manipulating objects of the type.
高级语言中的浮点类型采用了数据抽象中的一个关键概念:信息隐藏。内存单元中浮点数据值的实际格式对用户是隐藏的,唯一可用的操作是语言提供的操作。用户不得对该类型的数据创建新的操作,除非可以使用内置操作构造这些操作。用户无法直接操作值的实际表示部分,因为该表示是隐藏的。正是这一特性允许程序在特定语言的实现之间进行可移植性,即使这些实现可能对特定数据类型使用不同的表示。例如,在 20 世纪 80 年代中期出现 IEEE 754 标准浮点表示之前,不同的计算机体系结构使用了几种不同的表示。然而,这种变化并没有阻止使用浮点类型的程序在各种体系结构之间进行移植。
Floating-point types in high-level languages employ a key concept in data abstraction: information hiding. The actual format of the floating-point data value in a memory cell is hidden from the user, and the only operations available are those provided by the language. The user is not allowed to create new operations on data of the type, except those that can be constructed using the built-in operations. The user cannot directly manipulate the parts of the actual representation of values because that representation is hidden. It is this feature that allows program portability between implementations of a particular language, even though the implementations may use different representations for particular data types. For example, before the IEEE 754 standard floating-point representations appeared in the mid-1980s, there were several different representations being used by different computer architectures. However, this variation did not prevent programs that used floating-point types from being portable among the various architectures.
用户定义的抽象数据类型应提供与语言定义类型(例如浮点类型)相同的特性:(1)允许程序单元声明该类型的变量但隐藏该类型对象的表示的类型定义;(2)一组用于操作该类型对象的操作。
A user-defined abstract data type should provide the same characteristics as those of language-defined types, such as a floating-point type: (1) a type definition that allows program units to declare variables of the type but hides the representation of objects of the type; and (2) a set of operations for manipulating objects of the type.
现在我们在用户定义类型的上下文中正式定义抽象数据类型。抽象数据类型是满足以下条件的数据类型:
We now formally define an abstract data type in the context of user-defined types. An abstract data type is a data type that satisfies the following conditions:
该类型对象的表示对于使用该类型的程序单元是隐藏的,因此对这些对象可能进行的唯一直接操作是类型定义中提供的操作。
The representation of objects of the type is hidden from the program units that use the type, so the only direct operations possible on those objects are those provided in the type’s definition.
类型的声明和类型对象操作协议(提供类型接口)包含在单个语法单元中。类型的接口不依赖于对象的表示或操作的实现。此外,其他程序单元也可以创建定义类型的变量。
The declarations of the type and the protocols of the operations on objects of the type, which provide the type’s interface, are contained in a single syntactic unit. The type’s interface does not depend on the representation of the objects or the implementation of the operations. Also, other program units are allowed to create variables of the defined type.
信息隐藏有几个好处。其中之一就是可靠性的提高。使用特定抽象数据类型的程序单元称为客户端。客户端无法操纵直接(有意或无意)访问对象,从而增加此类对象的完整性。只能通过提供的操作来更改对象。
There are several benefits of information hiding. One of these is increased reliability. Program units that use a specific abstract data type are called clients of that type. Clients cannot manipulate the underlying representations of objects directly, either intentionally or by accident, thus increasing the integrity of such objects. Objects can be changed only through the provided operations.
信息隐藏的另一个好处是,它减少了程序员在编写或阅读程序的一部分时必须注意的代码范围和变量数量。特定变量的值只能由有限范围内的代码更改,这使得代码更容易理解,也使得找到错误更改的来源变得更容易。
Another benefit of information hiding is it reduces the range of code and number of variables of which a programmer must be aware when writing or reading a part of the program. The value of a particular variable can be changed only by code in a restricted range, making the code easier to understand and making it less challenging to find sources of incorrect changes.
信息隐藏也使得名称冲突不太可能发生,因为变量的范围较小。
Information hiding also makes name conflicts less likely, because the scopes of variables is smaller.
最后,考虑信息隐藏的以下优势:假设堆栈抽象的原始实现使用链接列表表示。稍后,由于该表示存在内存管理问题,堆栈抽象被更改为使用连续表示(在数组中实现堆栈的表示)。由于使用了数据抽象,因此可以在定义堆栈类型的代码中进行此更改,但堆栈抽象的任何客户端都不需要进行任何更改。当然,任何操作协议的更改都需要客户端进行更改。
Finally, consider the following advantage of information hiding: Suppose that the original implementation of the stack abstraction uses a linked list representation. At a later time, because of memory management problems with that representation, the stack abstraction is changed to use a contiguous representation (one that implements a stack in an array). Because data abstraction was used, this change can be made in the code that defines the stack type, but no changes will be required in any of the clients of the stack abstraction. Of course, a change in protocol of any of the operations would require changes in the clients.
尽管抽象数据类型的定义规定对象的数据成员必须对客户端隐藏,但在许多情况下客户端需要访问这些数据成员。常见的解决方案是提供访问器方法(有时称为getter和setter),允许客户端间接访问所谓的隐藏数据 - 这比简单地将数据公开(这将提供直接访问)更好的解决方案。访问器更好的原因有三:
Although the definition of an abstract data type specifies that data members of objects must be hidden from clients, many situations arise in which clients need to access these data members. The common solution is to provide accessor methods, sometimes called getters and setters, that allow clients indirect access to the so-called hidden data—a better solution than simply making the data public, which would provide direct access. There are three reasons why accessors are better:
可以通过使用 getter 方法但没有相应的 setter 方法来提供只读访问。
Read-only access can be provided by having a getter method but no corresponding setter method.
约束可以包含在 setter 中。例如,如果数据值应限制在特定范围内,setter 可以强制执行该限制。
Constraints can be included in setters. For example, if the data value should be restricted to a particular range, the setter can enforce that.
如果 getter 和 setter 是唯一的访问权限,则可以更改数据成员的实际实现而不会影响客户端。
The actual implementation of the data member can be changed without affecting the clients if getters and setters are the only access.
将抽象数据类型中的数据指定为公共数据并为该数据提供访问器方法都违反了抽象数据类型的原则。有些人认为这些只是漏洞,使不完美的设计变得可用。正如我们将在11.4.4.2节 中看到的那样,Ruby 不允许将实例数据设为公共数据。但是,Ruby 也使创建访问器函数变得非常容易。对于开发人员来说,设计所有数据实际上都是隐藏的抽象数据类型是一项挑战。
Both specifying data in an abstract data type to be public and providing accessor methods for that data are violations of the principles of abstract data types. Some believe these are simply loopholes that make an imperfect design usable. As we will see in Section 11.4.4.2, Ruby disallows making instance data public. However, Ruby also makes it very easy to create accessor functions. It is a challenge for developers to design abstract data types in which all of the data is actually hidden.
将类型声明及其操作打包到单个句法单元中的主要优点是,它提供了一种将程序组织成可单独编译的逻辑单元的方法。在某些情况下,实现包含在类型声明中;在其他情况下,它位于单独的句法单元中。将类型及其操作的实现放在不同的句法单元中的优点是它增加了程序的模块化,并且明确地分离了设计和实现。如果类型和操作的声明和定义都在同一个句法单元中,则必须有某种方法向客户端程序单元隐藏指定定义的单元部分。
The primary advantage of packaging the declarations of the type and its operations in a single syntactic unit is that it provides a method of organizing a program into logical units that can be compiled separately. In some cases, the implementation is included with the type declaration; in other cases, it is in a separate syntactic unit. The advantage of having the implementation of the type and its operations in different syntactic units is that it increases the program’s modularity and it is a clear separation of design and implementation. If both the declarations and the definitions of types and operations are in the same syntactic unit, there must be some means of hiding from client program units the parts of the unit that specify the definitions.
堆栈是一种广泛适用的数据结构,它存储一定数量的数据元素,并且只允许访问其一端(即顶部)的数据元素。假设要为具有以下抽象操作的堆栈构建一个抽象数据类型:
A stack is a widely applicable data structure that stores some number of data elements and only allows access to the data element at one of its ends, the top. Suppose an abstract data type is to be constructed for a stack that has the following abstract operations:
请注意,抽象数据类型的某些实现不需要创建和销毁操作。例如,只需将变量定义为抽象数据类型即可隐式创建底层数据结构并对其进行初始化。此类变量的存储空间可能会在变量作用域结束时隐式释放。
Note that some implementations of abstract data types do not require the create and destroy operations. For example, simply defining a variable to be of an abstract data type may implicitly create the underlying data structure and initialize it. The storage for such a variable may be implicitly deallocated at the end of the variable’s scope.
堆栈类型的客户端可能具有如下代码序列:
A client of the stack type could have a code sequence such as the following:
. . .
create(stk1);
push(stk1, color1);
push(stk1, color2);
temp = top(stk1);
. . .
. . .
create(stk1);
push(stk1, color1);
push(stk1, color2);
temp = top(stk1);
. . .
在语言中定义抽象数据类型的工具必须提供一个语法单元,该语法单元包含类型的声明和实现该类型对象操作的子程序的原型。必须能够使这些对抽象的客户端可见。这允许客户端声明抽象类型的变量并操纵它们的值。虽然类型名称必须具有外部可见性,但类型表示必须隐藏。类型表示和实现操作的子程序的定义可以出现在这个语法单元的内部或外部。
A facility for defining abstract data types in a language must provide a syntactic unit that encloses the declaration of the type and the prototypes of the subprograms that implement the operations on objects of the type. It must be possible to make these visible to clients of the abstraction. This allows clients to declare variables of the abstract type and manipulate their values. Although the type name must have external visibility, the type representation must be hidden. The type representation and the definitions of the subprograms that implement the operations may appear inside or outside this syntactic unit.
除了类型定义中提供的那些操作外,抽象数据类型的对象应该提供很少(如果有的话)通用内置操作。适用于各种抽象数据类型的操作并不多。其中包括赋值和相等与不等的比较。如果语言不允许用户重载赋值,则赋值操作必须包含在抽象中。在某些情况下,相等和不等的比较应该在抽象中预定义,但在其他情况下则不必。例如,如果类型实现为指针,相等可能意味着指针相等,但设计者可能希望它意味着指针引用的结构相等。
Few, if any, general built-in operations should be provided for objects of abstract data types, other than those provided with the type definition. There simply are not many operations that apply to a broad range of abstract data types. Among these are assignment and comparisons for equality and inequality. If the language does not allow users to overload assignment, the assignment operation must be included in the abstraction. Comparisons for equality and inequality should be predefined in the abstraction in some cases but not in others. For example, if the type is implemented as a pointer, equality may mean pointer equality, but the designer may want it to mean equality of the structures referenced by the pointers.
许多抽象数据类型都需要某些操作,但由于这些操作不是通用的,因此它们通常必须由类型的设计者提供。其中包括迭代器、访问器、构造函数和析构函数。第8章 讨论了迭代器。访问器提供了一种对数据进行访问的形式,这种访问形式对客户端的直接访问是隐藏的。构造函数用于初始化新创建对象的部分。析构函数通常用于回收可能由不进行隐式存储回收的语言中的抽象数据类型对象部分使用的堆存储。
Some operations are required by many abstract data types, but because they are not universal, they often must be provided by the designer of the type. Among these are iterators, accessors, constructors, and destructors. Iterators were discussed in Chapter 8. Accessors provide a form of access to data that is hidden from direct access by clients. Constructors are used to initialize parts of newly created objects. Destructors are often used to reclaim heap storage that may be used by parts of abstract data type objects in languages that do not do implicit storage reclamation.
如前所述,抽象数据类型的外壳定义了单一的数据类型及其操作。许多当代语言,包括 C++、Java 和 C#,都直接支持抽象数据类型。
As stated earlier, the enclosure for an abstract data type defines a single data type and its operations. Many contemporary languages, including C++, Java, and C#, directly support abstract data types.
第一个设计问题是抽象数据类型是否可以参数化。例如,如果语言支持参数化抽象数据类型,则可以为某些可以存储任何类型的元素的结构设计一个抽象数据类型。参数化抽象数据类型将在第11.5节 中讨论。第二个设计问题是提供哪些访问控制以及如何指定这些控制。最后,语言设计者必须决定类型的规范是否与其实现物理上分开(或者这是否是开发人员的选择)。
The first design issue is whether abstract data types can be parameterized. For example, if the language supports parameterized abstract data types, one could design an abstract data type for some structure that could store elements of any type. Parameterized abstract data types are discussed in Section 11.5. The second design issue is what access controls are provided and how such controls are specified. Finally, the language designer must decide whether the specification of the type is physically separate from its implementation (or whether that is a developer choice).
数据抽象的概念起源于 SIMULA 67,尽管该语言没有提供对抽象数据类型的完整支持,因为它没有提供隐藏实现细节的方法。在本节中,我们描述了 C++、Java、C# 和 Ruby 提供的数据抽象支持。
The concept of data abstraction had its origins in SIMULA 67, although that language did not provide complete support for abstract data types, because it did not include a way to hide implementation details. In this section, we describe the support for data abstraction provided by C++, Java, C#, and Ruby.
C++ 于 1985 年首次发布,它是在 C 的基础上添加新功能而创建的。第一个重要的新功能是支持面向对象编程。由于面向对象编程的主要组成部分之一是抽象数据类型,因此 C++ 显然需要支持它们。
C++, which was first released in 1985, was created by adding features to C. The first important additions were those to support object-oriented programming. Because one of the primary components of object-oriented programming is abstract data types, C++ obviously is required to support them.
C++ 提供了两个非常相似的结构,即类和结构,它们直接支持抽象数据类型。由于结构最常用于仅包含数据的情况,因此我们在此不再进一步讨论它们。
C++ provides two constructs that are very similar to each other, the class and the struct, which directly support abstract data types. Because structs are most commonly used when only data is included, we do not discuss them further here.
C++ 类是类型。声明类实例的 C++ 程序单元也可以访问该类中的任何公共实体,但只能通过该类的实例进行访问。
C++ classes are types. A C++ program unit that declares an instance of a class can also access any of the public entities in that class, but only through an instance of the class.
Bjarne Stroustrup 是 C++ 的设计者和原始实现者,著有《C++ 之旅》、《编程——使用 C++ 的原则和实践》、《C++ 编程语言》、《C++ 的设计和演化》等多部著作。他的研究兴趣包括分布式系统、设计、编程技术、软件开发工具和编程语言。他积极参与 C++ 的 ANSI/ISO 标准化工作。Stroustrup 博士是纽约市摩根士丹利技术部门的董事总经理、哥伦比亚大学计算机科学客座教授和德克萨斯 A&M 大学计算机科学杰出研究教授。他是美国国家工程院院士、ACM 院士和 IEEE 院士。1993 年,Stroustrup 因“为 C++ 编程语言奠定基础的早期工作”而获得 ACM Grace Murray Hopper 奖。凭借这些基础以及 Stroustrup 博士的持续努力,C++ 已成为计算历史上最具影响力的编程语言之一。”
Bjarne Stroustrup is the designer and original implementer of C++ and the author of A Tour of C++, Programming—Principles and Practice using C++, The C++ Programming Language, The Design and Evolution of C++, and many other publications. His research interests include distributed systems, design, programming techniques, software development tools, and programming languages. He is actively involved in the ANSI/ISO standardization of C++. Dr. Stroustrup is a Managing Director in the technology division of Morgan Stanley in New York City, a Visiting Professor in Computer Science at Columbia University, and a Distinguished Research Professor in Computer Science at Texas A&M University. He is a member of the National Academy of Engineering, an ACM Fellow, and an IEEE fellow. In 1993, Stroustrup received the ACM Grace Murray Hopper Award “for his early work laying the foundations for the C++ programming language. Based on the foundations and Dr. Stroustrup’s continuing efforts, C++ has become one of the most influential programming languages in the history of computing.”
(采访年份:2002 年)
(year of interview: 2002)
您在 20 世纪 80 年代初加入贝尔实验室之前从事什么工作?在哪里工作? 在贝尔实验室,我从事分布式系统领域的研究。我于 1979 年加入。在此之前,我在剑桥大学攻读该领域的博士学位。
What were you working on, and where, before you joined Bell Labs in the early 1980s? At Bell Labs, I was doing research in the general area of distributed systems. I joined in 1979. Before that, I was finishing my Ph.D. in that field in Cambridge University.
您是否立即开始学习“带类的 C”(后来成为 C++ )? 在开始学习带类的 C 之前以及在开发 C++ 期间,我参与了一些与分布式计算相关的项目。例如,我试图找到一种在多台计算机上分发 UNIX 内核的方法,并帮助许多项目构建模拟器。
Did you immediately start on “C with Classes” (which would later become C ++)? I worked on a few projects related to distributed computing before starting on C with Classes and during the development of that and of C++. For example, I was trying to find a way to distribute the UNIX kernel across several computers and helped a lot of projects build simulators.
是因为对数学的兴趣才让你进入这个行业吗? 我报了“数学与计算机科学”的学位,我的硕士学位正式是数学学位。我错误地认为计算是某种应用数学。我学了几年数学,认为自己数学不好,但这仍然比不懂数学要好得多。在我报名的时候,我甚至从未见过电脑。我喜欢计算是因为编程,而不是数学领域。
Was it an interest in mathematics that got you into this profession? I signed up for a degree in “mathematics with computer science” and my master’s degree is officially a math degree. I—wrongly—thought that computing was some kind of applied math. I did a couple of years of math and rate myself a poor mathematician, but that’s still much better than not knowing math. At the time I signed up, I had never even seen a computer. What I love about computing is the programming rather than the more mathematical fields.
我想从后往前追溯,列出一些我认为使 C++ 无处不在的因素,并听听你们的反应。它是“开源”的、非专有的,并由 ANSI/ISO 标准化。ISO C++ 标准很重要。有许多独立开发和不断发展的 C++ 实现。如果没有一个标准让它们遵守,也没有一个标准流程来帮助协调 C++ 的发展,就会爆发出各种方言的混乱。
I’d like to work backward, listing some items I think make C++ ubiquitous, and get your reaction. It’s “open source,” nonproprietary, and standardized by ANSI/ISO. The ISO C++ standard is important. There are many independently developed and evolving C++ implementations. Without a standard for them to adhere to and a standards process to help coordinate the evolution of C++, a chaos of dialects would erupt.
开源和商业实现都很重要。此外,对于许多用户来说,标准提供一定程度的保护措施以防止被实现提供商操纵也至关重要。
It is also important that there are both open-source and commercial implementations available. In addition, for many users, it is crucial that the standard provides a measure of protection from manipulation by implementation providers.
ISO 标准制定过程是开放和民主的。C++ 委员会的会议很少少于 50 人参加,每次会议通常有超过 8 个国家的代表参加。它不仅仅是一个供应商论坛。
The ISO standards process is open and democratic. The C++ committee rarely meets with fewer than 50 people present and typically more than eight nations are represented at each meeting. It is not just a vendors’ forum.
它非常适合系统编程(C++ 诞生时,系统编程是市场开发代码的最大领域)。
It’s ideal for systems programming (which, at the time C++ was born, was the largest sector of the market developing code).
是的,C++ 是任何系统编程项目的有力竞争者。它对嵌入式系统编程也很有效,而嵌入式系统编程是目前增长最快的领域。C++ 的另一个增长领域是高性能数字/工程/科学编程。
Yes, C++ is a strong contender for any systems-programming project. It is also effective for embedded systems programming, which is currently the fastest-growing sector. Yet another growth area for C++ is high-performance numeric/engineering/scientific programming.
它的面向对象特性和类/库的包含使编程更加高效和透明。C ++ 是一种多范式编程语言。也就是说,它支持几种基本的编程风格(包括面向对象编程)以及这些风格的组合。如果使用得当,它可以提供比仅使用一种范式所能提供的更干净、更灵活、更高效的库。C++ 标准库容器和算法就是一个例子,它基本上是一个通用编程框架。当与(面向对象的)类层次结构一起使用时,结果是类型安全性、效率和灵活性的无与伦比的组合。
Its object-oriented nature and inclusion of classes/libraries make programming more efficient and transparent. C++ is a multiparadigm programming language. That is, it supports several fundamental styles of programming (including object-oriented programming) and combinations of those styles. When used well, this leads to cleaner, more flexible, and more efficient libraries than can be provided using just one paradigm. The C++ standard library containers and algorithms, which is basically a generic programming framework, is an example. When used together with (object-oriented) class hierarchies, the result is an unsurpassed combination of type safety, efficiency, and flexibility.
在 AT&T 开发环境中孵化 C++ AT&T 贝尔实验室为 C++ 的开发提供了至关重要的环境。实验室是极具挑战性问题的异常丰富的来源,也是实践研究的独特支持环境。C++ 与 C 诞生于同一个研究实验室,受益于相同的知识传统、经验和杰出人才。自始至终,AT&T 都支持 C++ 的标准化。然而,与许多现代语言不同,C++ 并没有从大规模营销活动中受益。实验室的工作方式根本不是这样。
Its incubation in the AT&T development environment AT&T Bell Labs provided an environment that was crucial for C++’s development. The labs were an exceptionally rich source of challenging problems and a uniquely supportive environment for practical research. C++ emerged from the same research lab as C did and benefited from the same intellectual tradition, experience, and exceptional people. Throughout, AT&T supported the standardization of C++. However, C++ was not the beneficiary of a massive marketing campaign, like many modern languages. That’s simply not the way the labs work.
我是否遗漏了您最喜爱的清单上的任何内容? 毫无疑问。
Did I miss anything on your top list? Undoubtedly.
现在,让我解释一下对 C++ 的批评,并听听你们的反应:它很庞大/难以操作。C++ 中的“hello world”问题比 C 中的大 10 倍。C ++ 当然不是一种小型语言,但很少有现代语言是小型语言。如果一种语言很小,你往往需要庞大的库来完成工作,而且经常不得不依赖约定和扩展。我更喜欢将不可避免的复杂性的关键部分放在语言中,这样才能看到、教授和有效地标准化,而不是隐藏在系统的其他地方。在大多数情况下,我不认为 C++ 难以操作。在我的机器上,C++“hello world”程序并不比它的 C 等效程序大,在你的机器上也不应该如此。
Now, let me paraphrase from the C++ critiques and get your reactions: It’s huge/unwieldy. The “hello world” problem is 10 times larger in C++ than in C. C++ is certainly not a small language, but then few modern languages are. If a language is small, you tend to need huge libraries to get work done and often have to rely on conventions and extensions. I prefer to have key parts of the inevitable complexity in the language where it can be seen, taught, and effectively standardized rather than hidden elsewhere in a system. For most purposes, I don’t consider C++ unwieldy. The C++ “hello world” program isn’t larger than its C equivalent on my machine, and it shouldn’t be on yours.
事实上,在我的计算机上,“hello world”程序的 C++ 版本的目标代码比 C 版本小。没有任何语言原因导致一个版本应该比另一个版本大。这完全是实现者如何组织库的问题。如果一个版本明显大于另一个版本,请将问题报告给较大版本的实现者。
In fact, the object code for the C++ version of the “hello world” program is smaller than the C version on my machine. There is no language reason why the one version should be larger than the other. It is all an issue of how the implementor organized the libraries. If one version is significantly larger than the other, report the problem to the implementor of the larger version.
用 C++ 编程更难(与 C 相比)。(批评者这么说。)甚至你也承认过,说用 C 和 C++会搬起石头砸自己的脚。是的,我确实说过类似这样的话:“C 很容易搬起石头砸自己的脚;C++ 会让搬起石头砸自己的脚更难,但当你这样做的时候,C++ 会打断你的整条腿。”人们往往会忽略我对 C++ 的看法,它在不同程度上适用于所有强大的语言。当你保护人们免受简单的危险时,他们会陷入新的、不太明显的问题。避免简单问题的人可能只是在走向一个不那么简单的问题。在非常支持和保护的环境中,一个问题是,难题可能发现得太晚,或者一旦发现就很难补救。此外,罕见问题比常见问题更难发现,因为你不会怀疑它。
It’s tougher to program in C++ (compared with C). (Something the critics say.) Even you once admitted it, saying something about shooting yourself in the foot with C versus C ++. Yes, I did say something along the lines of “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, C++ blows your whole leg off.” What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don’t suspect it.
它适用于当今的嵌入式系统,但不适合当今的互联网软件。C++ 适用于当今的嵌入式系统。 它也适用于当今的“互联网软件”,并且被广泛使用。例如,看看我的“C++ 应用程序”网页。您会注意到,一些主要的 Web 服务提供商(如 Amazon、Adobe、Google、Quicken 和 Microsoft)严重依赖 C++。游戏是一个相关领域,您会发现 C++ 的使用非常频繁。
It’s appropriate for embedded systems of today but not for the Internet software of today. C++ is suitable for embedded systems today. It is also suitable—and widely used—for “Internet software” today. For example, have a look at my “C++ applications” Web page. You’ll notice that some of the major Web service providers, such as Amazon, Adobe, Google, Quicken, and Microsoft, critically rely on C++. Gaming is a related area in which you find heavy C++ use.
我是否还漏掉了另一个你经常收到的? 当然了。
Did I miss another one that you get a lot? Sure.
C++ 类中定义的数据称为数据成员;类中定义的函数(方法)称为成员函数。数据成员和成员函数分为两类:类和实例。类成员与类相关联;实例成员与类的实例相关联。本章仅讨论类的实例成员。类的所有实例共享一组成员函数,但每个实例都有自己的一组类数据成员。类实例可以是静态的、堆栈动态的或堆动态的。如果是静态的或堆栈动态的,则直接使用值变量引用它们。如果是堆动态的,则通过指针引用它们。类的堆栈动态实例始终通过对象声明的详细说明来创建。此外,这种类实例的生命周期在达到其声明范围的末尾时结束。堆动态类实例使用运算符创建new并使用运算符销毁delete。堆栈动态类和堆动态类都可以具有引用堆动态数据的指针数据成员,因此即使类实例是堆栈动态的,它也可以包括引用堆动态数据的数据成员。
The data defined in a C++ class are called data members; the functions (methods) defined in a class are called member functions. Data members and member functions appear in two categories: class and instance. Class members are associated with the class; instance members are associated with the instances of the class. In this chapter, only the instance members of a class are discussed. All of the instances of a class share a single set of member functions, but each instance has its own set of the class’s data members. Class instances can be static, stack dynamic, or heap dynamic. If static or stack dynamic, they are referenced directly with value variables. If heap dynamic, they are referenced through pointers. Stack dynamic instances of classes are always created by the elaboration of an object declaration. Furthermore, the lifetime of such a class instance ends when the end of the scope of its declaration is reached. Heap dynamic class instances are created with the new operator and destroyed with the delete operator. Both stack- and heap-dynamic classes can have pointer data members that reference heap dynamic data, so that even though a class instance is stack dynamic, it can include data members that reference heap dynamic data.
类的成员函数可以用两种不同的方式定义:完整的定义可以出现在类中,也可以仅出现在其头文件中。当成员函数的头文件和主体都出现在类定义中时,该成员函数将被隐式内联。回想一下,这意味着其代码放在调用者的代码中,而不需要通常的调用和返回链接。如果只有成员函数的头文件出现在类定义中,则其完整定义会出现在类之外并单独编译。允许内联成员函数的理由是为了节省实时应用程序中的函数调用开销,因为运行时效率是最重要的。内联成员函数的缺点是它会使类定义界面变得混乱,从而降低可读性。
A member function of a class can be defined in two distinct ways: The complete definition can appear in the class, or only in its header. When both the header and the body of a member function appear in the class definition, the member function is implicitly inlined. Recall that this means that its code is placed in the caller’s code, rather than requiring the usual call and return linkage. If only the header of a member function appears in the class definition, its complete definition appears outside the class and is separately compiled. The rationale for allowing member functions to be inlined was to save function call overhead in real-time applications, in which run-time efficiency is of utmost importance. The downside of inlining member functions is that it clutters the class definition interface, resulting in a reduction in readability.
将成员函数定义放在类定义之外,将规范与实现分开,这是现代编程的共同目标。
Placing member function definitions outside the class definition separates specification from implementation, a common goal of modern programming.
C++ 类可以包含隐藏和可见实体(即它们对类的客户端既隐藏又可见)。要隐藏的实体放在子句中private,可见或公共的实体出现在public子句中。public因此,子句描述了类实例的接口。还有第三类可见性,protected它使成员对子类可见,但对客户端不可见。
A C++ class can contain both hidden and visible entities (meaning they are either hidden from or visible to clients of the class). Entities that are to be hidden are placed in a private clause, and visible, or public, entities appear in a public clause. The public clause therefore describes the interface to class instances. There is also a third category of visibility, protected, which makes a member visible to subclasses, but not to clients.
C++ 允许用户在类定义中包含构造函数,这些函数用于初始化新创建对象的数据成员。构造函数还可以分配新对象的指针成员引用的堆动态数据。创建类类型的对象时会隐式调用构造函数。构造函数的名称与它初始化的对象的类相同。构造函数可以重载,但当然,类的每个构造函数都必须具有唯一的参数配置文件。
C++ allows the user to include constructor functions in class definitions, which are used to initialize the data members of newly created objects. A constructor may also allocate the heap-dynamic data that are referenced by the pointer members of the new object. Constructors are implicitly called when an object of the class type is created. A constructor has the same name as the class whose objects it initializes. Constructors can be overloaded, but of course each constructor of a class must have a unique parameter profile.
C++ 类还可以包含一个称为析构函数的函数,当类实例的生命周期结束时会隐式调用该函数。如前所述,堆栈动态类实例可以包含引用堆动态数据的指针成员。此类实例的析构函数可以包括指针delete成员上的运算符,以释放它们引用的堆空间。析构函数通常用作调试辅助工具,在这种情况下,它们会在释放对象数据成员之前显示或打印部分或全部数据成员的值。析构函数的名称是类的名称,前面有一个波浪符号 ( ~)。
A C++ class can also include a function called a destructor, which is implicitly called when the lifetime of an instance of the class ends. As stated earlier, stack-dynamic class instances can contain pointer members that reference heap-dynamic data. The destructor function for such an instance can include a delete operator on the pointer members to deallocate the heap space they reference. Destructors are often used as a debugging aid, in which case they display or print the values of some or all of the object’s data members before those members are deallocated. The name of a destructor is the class’s name, preceded by a tilde (~).
构造函数和析构函数都没有返回类型,也没有 usereturn语句。构造函数和析构函数都可以被显式调用。
Neither constructors nor destructors have return types, and neither use return statements. Both constructors and destructors can be explicitly called.
我们的 C++ 抽象数据类型示例是堆栈:
Our example of a C++ abstract data type is a stack:
#include <iostream.h>
class Stack {
private: //** These members are visible only to other
//** members and friends (see Section 11.6.4)
int *stackPtr;
int maxLen;
int topSub;
public: //** These members are visible to clients
Stack() { //** A constructor
stackPtr = new int [100];
maxLen = 99;
topSub = -1;
}
~Stack() {delete [] stackPtr;}; //** A destructor
void push(int number) {
if (topSub == maxLen)
cerr << "Error in push--stack is full\n";
else stackPtr[++topSub] = number;
}
void pop() {
if (empty())
cerr << "Error in pop--stack is empty\n";
else topSub--;
}
int top() {
if (empty())
cerr << "Error in top--stack is empty\n";
else
return (stackPtr[topSub]);
}
int empty() {return (topSub == -1);}
}
#include <iostream.h>
class Stack {
private: //** These members are visible only to other
//** members and friends (see Section 11.6.4)
int *stackPtr;
int maxLen;
int topSub;
public: //** These members are visible to clients
Stack() { //** A constructor
stackPtr = new int [100];
maxLen = 99;
topSub = -1;
}
~Stack() {delete [] stackPtr;}; //** A destructor
void push(int number) {
if (topSub == maxLen)
cerr << "Error in push--stack is full\n";
else stackPtr[++topSub] = number;
}
void pop() {
if (empty())
cerr << "Error in pop--stack is empty\n";
else topSub--;
}
int top() {
if (empty())
cerr << "Error in top--stack is empty\n";
else
return (stackPtr[topSub]);
}
int empty() {return (topSub == -1);}
}
我们仅讨论此类定义的几个方面,因为没有必要了解代码的所有细节。类的对象Stack是堆栈动态的,但包含引用堆动态数据的指针。该类Stack有三个数据成员stackPtr— maxLen、和topSub—它们都是私有的。stackPtr用于引用堆动态数据,它是实现堆栈的数组。该类还有四个公共成员函数— push、、和—以及一个构造函数和一个析构函数。所有成员函数定义都包含在此类中,尽管它们可能是外部定义的。由于包含了成员函数的主体,因此它们都是隐式内联的。构造函数使用运算符从堆中分配一个包含 100 个元素的数组。它还初始化和。poptopemptynewintmaxLentopSub
We discuss only a few aspects of this class definition, because it is not necessary to understand all of the details of the code. Objects of the Stack class are stack dynamic but include a pointer that references heap-dynamic data. The Stack class has three data members—stackPtr, maxLen, and topSub—all of which are private. stackPtr is used to reference the heap-dynamic data, which is the array that implements the stack. The class also has four public member functions—push, pop, top, and empty—as well as a constructor and a destructor. All of the member function definitions are included in this class, although they could have been externally defined. Because the bodies of the member functions are included, they are all implicitly inlined. The constructor uses the new operator to allocate an array of 100 int elements from the heap. It also initializes maxLen and topSub.
以下是使用Stack抽象数据类型的示例程序:
The following is an example program that uses the Stack abstract data type:
void main() {
int topOne;
Stack stk; //** Create an instance of the Stack class
stk.push(42);
stk.push(17);
topOne = stk.top();
stk.pop();
. . .
}
void main() {
int topOne;
Stack stk; //** Create an instance of the Stack class
stk.push(42);
stk.push(17);
topOne = stk.top();
stk.pop();
. . .
}
以下是仅包含成员函数原型的类定义Stack。此代码存储在.h文件扩展名为 的头文件中。成员函数的定义遵循类定义。它们使用范围解析运算符::来指示它们所属的类。这些定义存储在文件扩展名为 的代码文件中.cpp。
Following is a definition of the Stack class with only prototypes of the member functions. This code is stored in a header file with the .h file name extension. The definitions of the member functions follow the class definition. These use the scope resolution operator, ::, to indicate the class to which they belong. These definitions are stored in a code file with the file name extension .cpp.
// Stack.h - the header file for the Stack class
#include <iostream.h>
class Stack {
private: //** These members are visible only to other
//** members and friends (see Section 11.6.3)
int *stackPtr;
int maxLen;
int topSub;
public: //** These members are visible to clients
Stack(); //** A constructor
~Stack(); //** A destructor
void push(int);
void pop();
int top();
int empty();
}
// Stack.cpp - the implementation file for the Stack class
#include <iostream.h>
#include "Stack.h"
using std::cout;
Stack::Stack() { //** A constructor
stackPtr = new int [100];
maxLen = 99;
topSub = -1;
}
Stack::~Stack() {delete [] stackPtr;}; //** A destructor
void Stack::push(int number) {
if (topSub == maxLen)
cerr << "Error in push--stack is full\n";
else stackPtr[++topSub] = number;
}
void Stack::pop() {
if (topSub == -1)
cerr << "Error in pop--stack is empty\n";
else topSub--;
}
int top() {
if (topSub == -1)
cerr << "Error in top--stack is empty\n";
else
return (stackPtr[topSub]);
}
int Stack::empty() {return (topSub == -1);}
// Stack.h - the header file for the Stack class
#include <iostream.h>
class Stack {
private: //** These members are visible only to other
//** members and friends (see Section 11.6.3)
int *stackPtr;
int maxLen;
int topSub;
public: //** These members are visible to clients
Stack(); //** A constructor
~Stack(); //** A destructor
void push(int);
void pop();
int top();
int empty();
}
// Stack.cpp - the implementation file for the Stack class
#include <iostream.h>
#include "Stack.h"
using std::cout;
Stack::Stack() { //** A constructor
stackPtr = new int [100];
maxLen = 99;
topSub = -1;
}
Stack::~Stack() {delete [] stackPtr;}; //** A destructor
void Stack::push(int number) {
if (topSub == maxLen)
cerr << "Error in push--stack is full\n";
else stackPtr[++topSub] = number;
}
void Stack::pop() {
if (topSub == -1)
cerr << "Error in pop--stack is empty\n";
else topSub--;
}
int top() {
if (topSub == -1)
cerr << "Error in top--stack is empty\n";
else
return (stackPtr[topSub]);
}
int Stack::empty() {return (topSub == -1);}
Java 对抽象数据类型的支持与 C++ 类似。但是,它们之间存在一些重要差异。所有对象都从堆中分配,并通过引用变量访问。Java 中的方法必须在类中完整定义。方法主体必须与其对应的方法头一起出现。2因此,Java 抽象数据类型在单个语法单元中声明和定义。Java 编译器可以内联任何未被覆盖的方法。通过将定义声明为私有,可以对客户端隐藏定义。
Java support for abstract data types is similar to that of C++. There are, however, a few important differences. All objects are allocated from the heap and accessed through reference variables. Methods in Java must be defined completely in a class. A method body must appear with its corresponding method header.2 Therefore, a Java abstract data type is both declared and defined in a single syntactic unit. A Java compiler can inline any method that is not overridden. Definitions are hidden from clients by declaring them to be private.
Java 类相对于 C++ 类的一个重要优势是它对所有对象使用隐式垃圾收集。这允许程序员忽略对象释放的问题以及抽象数据类型实现中释放代码的混乱。
One important advantage of Java’s classes over the classes of C++ is that it uses implicit garbage collection of all objects. This allows the programmer to ignore the issue of deallocation of objects and the clutter of deallocation code in the implementations of abstract data types.
在 Java 中,访问修饰符可以附加到方法和变量定义中,而不是在类定义中使用 private 和 public 子句。如果实例变量或方法没有访问修饰符,则它具有包访问权限,这将在11.7.2节 中讨论。
Rather than having private and public clauses in its class definitions, in Java access modifiers can be attached to method and variable definitions. If an instance variable or method does not have an access modifier, it has package access, which is discussed in Section 11.7.2.
以下是我们的堆栈示例的 Java 类定义:
The following is a Java class definition for our stack example:
class StackClass {
private int [] stackRef;
private int maxLen,
topIndex;
public StackClass() { // A constructor
stackRef = new int [100];
maxLen = 99;
topIndex = -1;
}
public void push(int number) {
if (topIndex == maxLen)
System.out.println("Error in push–stack is full");
else stackRef[++topIndex] = number;
}
public void pop() {
if (empty())
System.out.println("Error in pop–stack is empty");
else --topIndex;
}
public int top() {
if (empty()) {
System.out.println("Error in top–stack is empty");
return 9999;
}
else
return (stackRef[topIndex]);
}
public boolean empty() {return (topIndex == -1);}
}
class StackClass {
private int [] stackRef;
private int maxLen,
topIndex;
public StackClass() { // A constructor
stackRef = new int [100];
maxLen = 99;
topIndex = -1;
}
public void push(int number) {
if (topIndex == maxLen)
System.out.println("Error in push–stack is full");
else stackRef[++topIndex] = number;
}
public void pop() {
if (empty())
System.out.println("Error in pop–stack is empty");
else --topIndex;
}
public int top() {
if (empty()) {
System.out.println("Error in top–stack is empty");
return 9999;
}
else
return (stackRef[topIndex]);
}
public boolean empty() {return (topIndex == -1);}
}
使用以下示例类StackClass:
An example class that uses StackClass follows:
public class TstStack {
public static void main(String[] args) {
StackClass myStack = new StackClass();
myStack.push(42);
myStack.push(29);
System.out.println("29 is: " + myStack.top());
myStack.pop();
System.out.println("42 is: " + myStack.top());
myStack.pop();
myStack.pop(); // Produces an error message
}
}
public class TstStack {
public static void main(String[] args) {
StackClass myStack = new StackClass();
myStack.push(42);
myStack.push(29);
System.out.println("29 is: " + myStack.top());
myStack.pop();
System.out.println("42 is: " + myStack.top());
myStack.pop();
myStack.pop(); // Produces an error message
}
}
Java 和 C++ 堆栈实现之间的一个明显区别是 Java 版本中没有析构函数,这可以通过 Java 的隐式垃圾收集来解决。3
One obvious difference between the Java and the C++ implementations of the stack is the lack of a destructor in the Java version, obviated by Java’s implicit garbage collection.3
尽管在某些主要的外观方面有所不同,但 Java 对抽象数据类型的支持与 C++ 类似。Java 清楚地提供了设计抽象数据类型所需的内容。
Although different in some primarily cosmetic ways, Java’s support for abstract data types is similar to that of C++. Java clearly provides for what is necessary to design abstract data types.
回想一下,C# 基于 C++ 和 Java,并且还包含一些新构造函数。与 Java 一样,所有 C# 类实例都是堆动态的。默认构造函数为所有类预定义,它们为实例数据提供初始值。这些构造函数提供典型的初始值,例如0类型int和false类型boolean。用户可以为其定义的任何类提供一个或多个构造函数。此类构造函数可以为类的部分或全部实例数据分配初始值。任何未在用户定义的构造函数中初始化的实例变量都由默认构造函数赋值。
Recall that C# is based on both C++ and Java and that it also includes some new constructs. Like Java, all C# class instances are heap dynamic. Default constructors, which provide initial values for instance data, are predefined for all classes. These constructors provide typical initial values, such as 0 for int types and false for boolean types. A user can furnish one or more constructors for any class he or she defines. Such constructors can assign initial values to some or all of the instance data of the class. Any instance variable that is not initialized in a user-defined constructor is assigned a value by the default constructor.
虽然 C# 允许定义析构函数,但是由于它对大多数堆对象使用垃圾收集,因此很少使用析构函数。
Although C# allows destructors to be defined, because it uses garbage collection for most of its heap objects, destructors are rarely used.
C++ 包含类和结构,它们是几乎相同的构造。唯一的区别是类的默认访问修饰符是private,而结构的默认访问修饰符是public。C# 也有结构,但它们与 C++ 的结构非常不同。在 C# 中,结构在某种意义上是轻量级类。它们可以具有构造函数、属性、方法和数据字段,并且可以实现接口,但不支持继承。C# 中结构和类之间的另一个重要区别是,结构是值类型,而不是引用类型。它们分配在运行时堆栈上,而不是堆上。如果它们像其他值类型一样作为参数传递,则默认情况下它们是按值传递的。所有 C# 值类型,包括其所有原始类型,实际上都是结构。结构可以通过声明来创建,就像其他预定义值类型一样,例如int或float。它们也可以用new运算符创建,该运算符调用构造函数来初始化它们。
C++ includes both classes and structs, which are nearly identical constructs. The only difference is that the default access modifier for class is private, whereas for structs it is public. C# also has structs, but they are very different from those of C++. In C#, structs are, in a sense, lightweight classes. They can have constructors, properties, methods, and data fields and can implement interfaces but do not support inheritance. One other important difference between structs and classes in C# is that structs are value types, as opposed to reference types. They are allocated on the run-time stack, rather than the heap. If they are passed as parameters, like other value types, by default they are passed by value. All C# value types, including all of its primitive types, are actually structs. Structs can be created by declaring them, like other predefined value types, such as int or float. They can also be created with the new operator, which calls a constructor to initialize them.
在 C# 中,结构主要用于实现相对较小的简单类型,这些类型无需作为继承的基类型。当类型的对象方便地分配在堆栈而不是堆中时,也会使用它们。
Structs are used in C# primarily to implement relatively small simple types that need never be base types for inheritance. They are also used when it is convenient for the objects of the type to be stack as opposed to heap allocated.
C# 使用private和protected访问修饰符的方式与 Java 中的完全相同。
C# uses the private and protected access modifiers exactly as they are used in Java.
C# 提供了从 Delphi 继承的属性,作为实现 getter 和 setter 的一种方式,而无需客户端显式调用方法。属性提供对特定私有实例数据的隐式访问。例如,考虑以下简单的类和客户端代码:
C# provides properties, which it inherited from Delphi, as a way of implementing getters and setters without requiring explicit method calls by the client. Properties provide implicit access to specific private instance data. For example, consider the following simple class and client code:
public class Weather {
public int DegreeDays { //** DegreeDays is a property
get {
return degreeDays;
}
set {
if(value < 0 || value > 30)
Console.WriteLine(
"Value is out of range: {0}", value);
else
degreeDays = value;
}
}
private int degreeDays;
. . .
}
. . .
Weather w = new Weather();
int degreeDaysToday, oldDegreeDays;
. . .
w.DegreeDays = degreeDaysToday;
. . .
oldDegreeDays = w.DegreeDays;
public class Weather {
public int DegreeDays { //** DegreeDays is a property
get {
return degreeDays;
}
set {
if(value < 0 || value > 30)
Console.WriteLine(
"Value is out of range: {0}", value);
else
degreeDays = value;
}
}
private int degreeDays;
. . .
}
. . .
Weather w = new Weather();
int degreeDaysToday, oldDegreeDays;
. . .
w.DegreeDays = degreeDaysToday;
. . .
oldDegreeDays = w.DegreeDays;
在类中,定义了Weather属性。此属性提供了一个 getter 方法和一个 setter 方法来访问私有数据成员。在类定义之后的客户端代码中,将其视为公共成员变量,尽管只能通过属性访问它。请注意setter 方法中隐式变量的使用。这是引用属性新值的机制。DegreeDaysdegreeDaysdegreeDaysvalue
In the class Weather, the property DegreeDays is defined. This property provides a getter method and a setter method for access to the private data member, degreeDays. In the client code following the class definition, degreeDays is treated as if it were a public-member variable, although access to it is available through the property only. Notice the use of the implicit variable value in the setter method. This is the mechanism by which the new value of the property is referenced.
这里没有展示 C# 中的堆栈示例。第11.4.2.1节 中的 Java 版本与 C# 版本之间的唯一区别是输出方法调用以及使用bool而不是boolean作为方法的返回类型empty。
The stack example is not shown here in C#. The only difference between the Java version in Section 11.4.2.1 and the C# version is the output method calls and the use of bool instead of boolean for the return type of the empty method.
Ruby 通过其类提供对抽象数据类型的支持。在功能方面,Ruby 类与 C++ 和 Java 中的类类似。
Ruby provides support for abstract data types through its classes. In terms of capabilities, Ruby classes are similar to those in C++ and Java.
在 Ruby 中,类是在以class保留字开头并以 结尾的复合语句中定义的end。实例变量的名称具有特殊的语法形式,它们必须以 at 符号 ( @) 开头。实例方法具有与 Ruby 中的函数相同的语法:它们以def保留字开头并以 结尾end。类方法与实例方法的区别在于,类方法的名称开头附加有句点分隔符的类名。例如,在名为 的类中Stack,类方法的名称以 开头Stack。Ruby 中的构造函数名为initialize。由于构造函数不能重载,因此每个类只能有一个。
In Ruby, a class is defined in a compound statement opened with the class reserved word and closed with end. The names of instance variables have a special syntactic form, they must begin with at signs (@). Instance methods have the same syntax as functions in Ruby: They begin with the def reserved word and end with end. Class methods are distinguished from instance methods by having the class name appended to the beginning of their names with a period separator. For example, in a class named Stack, a class method’s name would begin with Stack. Constructors in Ruby are named initialize. Because the constructor cannot be overloaded, there only can be one per class.
Ruby 中的类是动态的,因为可以随时添加成员。只需添加指定新成员的其他类定义即可。此外,甚至String可以扩展该语言的预定义类,例如。例如,考虑以下类定义:
Classes in Ruby are dynamic in the sense that members can be added at any time. This is done by simply including additional class definitions that specify the new members. Moreover, even predefined classes of the language, such as String, can be extended. For example, consider the following class definition:
class myClass
def meth1
. . .
end
end
class myClass
def meth1
. . .
end
end
可以通过添加第二个方法来扩展此类,meth2并使用第二个类定义:
This class could be extended by adding a second method, meth2, with a second class definition:
class myClass
def meth2
. . .
end
end
class myClass
def meth2
. . .
end
end
还可以从类中删除方法。这可以通过提供另一个类定义来实现,其中要删除的方法remove_method作为参数发送给方法。Ruby 的动态类是语言设计者用可读性(以及可靠性)换取灵活性的另一个例子。允许对类进行动态更改显然会为语言增加灵活性,但会损害可读性。要确定类在程序中特定位置的行为,必须在程序中找到它的所有定义并考虑所有这些定义。
Methods can also be removed from a class. This is done by providing another class definition in which the method to be removed is sent to the method remove_method as a parameter. The dynamic classes of Ruby is another example of a language designer trading readability (and as a consequence, reliability) for flexibility. Allowing dynamic changes to classes clearly adds flexibility to the language, while harming readability. To determine the behavior of a class at a particular point in a program, one must find all of its definitions in the program and consider all of them.
Ruby 中方法的访问控制是动态的,因此仅在执行期间检测访问冲突。默认方法访问是公共的,但也可以是受保护的或私有的。有两种方法可以指定访问控制,这两种方法都使用与访问级别同名的函数private和public。一种方法是调用不带参数的相应函数。这会重置类中后续定义方法的默认访问权限。例如,
Access control for methods in Ruby is dynamic, so access violations are detected only during execution. The default method access is public, but it can also be protected or private. There are two ways to specify the access control, both of which use functions with the same names as the access levels, private and public. One way is to call the appropriate function without parameters. This resets the default access for subsequently defined methods in the class. For example,
class MyClass
def meth1
. . .
end
. . .
private
def meth7
. . .
end
. . .
end # of class MyClass
class MyClass
def meth1
. . .
end
. . .
private
def meth7
. . .
end
. . .
end # of class MyClass
另一种方法是使用特定方法的名称作为参数来调用访问控制函数。例如,以下内容在语义上等同于前面的类定义:
The alternative is to call the access control functions with the names of the specific methods as parameters. For example, the following is semantically equivalent to the previous class definition:
class MyClass
def meth1
. . .
end
. . .
def meth7
. . .
end
private :meth7, . . .
end # of class MyClass
class MyClass
def meth1
. . .
end
. . .
def meth7
. . .
end
private :meth7, . . .
end # of class MyClass
在 Ruby 中,类的所有数据成员都是私有的,并且无法更改。因此,数据成员只能通过类的方法访问,其中一些方法可能是访问器方法。在 Ruby 中,可通过访问器方法访问的实例数据称为属性。
In Ruby, all data members of a class are private, and that cannot be changed. So, data members can be accessed only by the methods of the class, some of which may be accessor methods. In Ruby, instance data that are accessible through accessor methods are called attributes.
对于名为 的实例变量@sum,getter 和 setter 方法如下:
For an instance variable named @sum, the getter and setter methods would be as follows:
def sum
@sum
end
def sum=(new_sum)
@sum = new_sum
end
def sum
@sum
end
def sum=(new_sum)
@sum = new_sum
end
请注意,getter 方法的名称为实例变量名称减去@。setter 方法的名称与相应的 getter 方法的名称相同,只是它们附加了一个等号 ( =)。
Notice that getters are given the name of the instance variable minus the @. The names of setter methods are the same as those of the corresponding getters, except they have an equal sign (=) attached.
attr_readerRuby 系统可以通过在类定义中分别包含对和 的调用来隐式生成 getter 和 setter attr_writer。这些参数是属性名称的符号,如下所示:
Getters and setters can be implicitly generated by the Ruby system by including calls to attr_reader and attr_writer, respectively, in the class definition. The parameters to these are the symbols of the attribute’s names, as is illustrated in the following:
attr_reader :sum, :total
attr_writer :sum
attr_reader :sum, :total
attr_writer :sum以下是用 Ruby 编写的堆栈示例:
Following is the stack example written in Ruby:
# Stack.rb - defines and tests a stack of maximum length
# 100, implemented in an array
class StackClass
# Constructor
def initialize
@stackRef = Array.new(100)
@maxLen = 100
@topIndex = -1
end
# push method
def push(number)
if @topIndex == @maxLen
puts "Error in push - stack is full"
else
@topIndex = @topIndex + 1
@stackRef[@topIndex] = number
end
end
# pop method
def pop
if empty
puts "Error in pop - stack is empty"
else
@topIndex = @topIndex - 1
end
end
# top method
def top
if empty
puts "Error in top - stack is empty"
else
@stackRef[@topIndex]
end
end
# empty method
def empty
@topIndex == -1
end
end # of Stack class
# Test code for StackClass
myStack = StackClass.new
myStack.push(42)
myStack.push(29)
puts "Top element is (should be 29): #{myStack.top}"
myStack.pop
puts "Top element is (should be 42): #{myStack.top}"
myStack.pop
# The following pop should produce an
# error message - stack is empty
myStack.pop
# Stack.rb - defines and tests a stack of maximum length
# 100, implemented in an array
class StackClass
# Constructor
def initialize
@stackRef = Array.new(100)
@maxLen = 100
@topIndex = -1
end
# push method
def push(number)
if @topIndex == @maxLen
puts "Error in push - stack is full"
else
@topIndex = @topIndex + 1
@stackRef[@topIndex] = number
end
end
# pop method
def pop
if empty
puts "Error in pop - stack is empty"
else
@topIndex = @topIndex - 1
end
end
# top method
def top
if empty
puts "Error in top - stack is empty"
else
@stackRef[@topIndex]
end
end
# empty method
def empty
@topIndex == -1
end
end # of Stack class
# Test code for StackClass
myStack = StackClass.new
myStack.push(42)
myStack.push(29)
puts "Top element is (should be 29): #{myStack.top}"
myStack.pop
puts "Top element is (should be 42): #{myStack.top}"
myStack.pop
# The following pop should produce an
# error message - stack is empty
myStack.pop
回想一下,符号#{变量}将变量的值转换为字符串,然后将其插入到它出现的字符串中。此类定义了一个可以存储任何类型的对象的堆栈结构。
Recall that the notation #{variable} converts the value of the variable to a string, which is then inserted into the string in which it appears. This class defines a stack structure that can store objects of any type.
回想一下,在 Ruby 中,一切都是对象,数组实际上是对象引用的数组。这显然使这个堆栈比 C++ 和 Java 中的类似示例更灵活。此外,只需将所需的最大长度传递给构造函数,此类的对象就可以具有任何给定的最大长度。当然,由于 Ruby 中的数组具有动态长度,因此可以修改该类以实现不受任何长度限制的堆栈对象,除非机器的内存容量强加了长度限制。由于类和实例变量的名称具有不同的形式,因此 Ruby 在可读性方面比本节讨论的其他语言略有优势。
Recall that in Ruby, everything is an object and arrays are actually arrays of references to objects. That clearly makes this stack more flexible than the similar examples in C++ and Java. Furthermore, simply by passing the desired maximum length to the constructor, objects of this class could have any given maximum length. Of course, because arrays in Ruby have dynamic length, the class could be modified to implement stack objects that are not restricted to any length, except that imposed by the machine’s memory capacity. Because the names of class and instance variables have different forms, Ruby has a slight readability advantage over the other languages discussed in this section.
能够参数化抽象数据类型通常很方便。例如,我们应该能够设计一个可以存储任何标量类型元素的堆栈抽象数据类型,而不必为每种不同的标量类型编写单独的堆栈抽象。请注意,这只是静态类型语言的问题。在像 Ruby 这样的动态类型语言中,任何堆栈都可以隐式存储任何类型的元素。事实上,堆栈的不同元素可以是不同的类型。在以下三个小节中,我们将讨论 C++、Java 5.0 和 C# 2005 构造参数化抽象数据类型的能力。
It is often convenient to be able to parameterize abstract data types. For example, we should be able to design a stack abstract data type that can store any scalar type elements rather than be required to write a separate stack abstraction for every different scalar type. Note that this is only an issue for static typed languages. In a dynamic typed language like Ruby, any stack implicitly can store any type elements. In fact, different elements of the stack could be of different types. In the following three subsections, the capabilities of C++, Java 5.0, and C# 2005 to construct parameterized abstract data types are discussed.
为了使11.4.1节 的示例 C++ 堆栈类在堆栈大小方面具有通用性,只需要更改构造函数,如下所示:
To make the example C++ stack class of Section 11.4.1 generic in the stack size, only the constructor function needs to be changed, as in the following:
Stack(int size) {
stackPtr = new int [size];
maxLen = size - 1;
topSub = -1;
}
Stack(int size) {
stackPtr = new int [size];
maxLen = size - 1;
topSub = -1;
}
堆栈对象的声明现在可能如下所示:
The declaration for a stack object now may appear as follows:
Stack stk(150);Stack stk(150);
的类定义Stack可以包含两个构造函数,因此用户可以使用默认大小的堆栈或指定其他大小。
The class definition for Stack can include both constructors, so users can use the default-size stack or specify some other size.
通过将类设为模板类,可以将堆栈的元素类型设为通用类型。然后,元素类型可以作为模板参数。堆栈类型的模板类定义如下:
The element type of the stack can be made generic by making the class a templated class. Then, the element type can be a template parameter. The definition of the templated class for a stack type is as follows:
#include <iostream.h>
template <typename Type> // Type is the template parameter
class Stack {
private:
Type *stackPtr;
int maxLen;
int topSub;
public:
// A constructor for 100 element stacks
Stack() {
stackPtr = new Type [100];
maxLen = 99;
topSub = -1;
}
// A constructor for a given number of elements
Stack(int size) {
stackPtr = new Type [size];
maxLen = size - 1;
topSub = -1;
}
~Stack() {delete stackPtr;}; // A destructor
void push(Type number) {
if (topSub == maxLen)
cout << "Error in push–stack is full\n";
else stackPtr[++ topSub] = number;
}
void pop() {
if (empty())
cout << "Error in pop–stack is empty\n";
else topSub --;
}
Type top() {
if (empty())
cerr << "Error in top--stack is empty\n";
else
return (stackPtr[topSub]);
}
int empty() {return (topSub == -1);}
}
#include <iostream.h>
template <typename Type> // Type is the template parameter
class Stack {
private:
Type *stackPtr;
int maxLen;
int topSub;
public:
// A constructor for 100 element stacks
Stack() {
stackPtr = new Type [100];
maxLen = 99;
topSub = -1;
}
// A constructor for a given number of elements
Stack(int size) {
stackPtr = new Type [size];
maxLen = size - 1;
topSub = -1;
}
~Stack() {delete stackPtr;}; // A destructor
void push(Type number) {
if (topSub == maxLen)
cout << "Error in push–stack is full\n";
else stackPtr[++ topSub] = number;
}
void pop() {
if (empty())
cout << "Error in pop–stack is empty\n";
else topSub --;
}
Type top() {
if (empty())
cerr << "Error in top--stack is empty\n";
else
return (stackPtr[topSub]);
}
int empty() {return (topSub == -1);}
}
C++ 模板类在编译时实例化为类型类。例如,可以使用以下声明创建模板类的实例Stack以及类型类的实例:
C++ templated classes are instantiated to become typed classes at compile time. For example, an instance of the templated Stack class, as well as an instance of the typed class, can be created with the following declaration:
Stack<int> myIntStack;Stack<int> myIntStack;
Stack但是,如果已经为该类型创建了模板类的实例int,则无需创建类型类。
However, if an instance of the templated Stack class has already been created for the int type, the typed class need not be created.
Java 5.0 支持一种参数化抽象数据类型,其中泛型参数必须是类。回想一下,第9章 曾简要讨论过这些内容。
Java 5.0 supports a form of parameterized abstract data types in which the generic parameters must be classes. Recall that these were briefly discussed in Chapter 9.
最常见的泛型类型是集合类型,例如LinkedList和ArrayList,它们在添加对泛型的支持之前就存在于 Java 类库中。原始集合类型存储Object类实例,因此它们可以存储任何对象(但不是原始类型)。因此,集合类型始终能够存储多种类型(只要它们是类)。这样做有三个问题:首先,每次从集合中删除一个对象时,都必须将其转换为适当的类型。其次,将元素添加到集合时没有错误检查。这意味着一旦创建了集合,就可以将任何类的对象添加到集合中,即使该集合仅用于存储Integer对象。第三,集合类型不能存储原始类型。因此,要int在 中存储值ArrayList,必须先将值放入Integer类实例中。例如,考虑以下代码:
The most common generic types are collection types, such as LinkedList and ArrayList, which were in the Java class library before support for generics was added. The original collection types stored Object class instances, so they could store any objects (but not primitive types). Therefore, the collection types have always been able to store multiple types (as long as they are classes). There were three issues with this: First, every time an object was removed from the collection, it had to be cast to the appropriate type. Second, there was no error checking when elements were added to the collection. This meant that once the collection was created, objects of any class could be added to the collection, even if the collection was meant to store only Integer objects. Third, the collection types could not store primitive types. So, to store int values in an ArrayList, the value first had to be put in an Integer class instance. For example, consider the following code:
//* Create an ArrayList object
ArrayList myArray = new ArrayList();
//* Create an element
myArray.add(0, new Integer(47));
//* Get first object
Integer myInt = (Integer)myArray.get(0);
//* Create an ArrayList object
ArrayList myArray = new ArrayList();
//* Create an element
myArray.add(0, new Integer(47));
//* Get first object
Integer myInt = (Integer)myArray.get(0);
在 Java 5.0 中,集合类(其中最常用的是)ArrayList变成了泛型类。此类类通过调用new类构造函数并向其传递尖括号中的泛型参数来实例化。例如,ArrayList可以使用Integer以下语句实例化类以存储对象:
In Java 5.0, the collection classes, the most commonly used of which is ArrayList, became a generic class. Such classes are instantiated by calling new on the class constructor and passing it the generic parameter in pointed brackets. For example, the ArrayList class can be instantiated to store Integer objects with the following statement:
ArrayList <Integer> myArray = new ArrayList <Integer>();ArrayList <Integer> myArray = new ArrayList <Integer>();
这个新类克服了 Java 5.0 之前集合中的两个问题。只有Integer对象才能放入myArray集合中。此外,从集合中删除对象时无需进行强制类型转换。
This new class overcomes two of the problems with pre-Java 5.0 collections. Only Integer objects can be put into the myArray collection. Furthermore, there is no need to cast an object being removed from the collection.
Java 5.0 还包括链表、队列和集合的通用集合。
Java 5.0 also includes generic collections for linked-lists, queues, and sets.
用户还可以在 Java 5.0 中定义泛型类。例如,我们可以有以下内容:
Users also can define generic classes in Java 5.0. For example, we could have the following:
public class MyClass<T> {
. . .
}
public class MyClass<T> {
. . .
}
此类可以通过下列方法实例化:
This class could be instantiated with the following:
MyClass<String> myString;MyClass<String> myString;
这些用户定义的泛型类有一些缺点。首先,它们无法存储基元。其次,元素无法索引。必须使用方法将元素添加到用户定义的泛型集合中add。接下来,我们使用 实现泛型堆栈示例。请注意,使用方法找到 的ArrayList最后一个元素,该方法返回结构中的元素数。使用方法从结构中删除元素。以下是泛型类:ArrayListsizeremove
There are some drawbacks to these user-defined generic classes. For one thing, they cannot store primitives. Second, the elements cannot be indexed. Elements must be added to user-defined generic collections with the add method. Next, we implement the generic stack example using an ArrayList. Note that the last element of an ArrayList is found using the size method, which returns the number of elements in the structure. Elements are deleted from the structure with the remove method. Following is the generic class:
import java.util.*;
public class Stack2<T> {
private ArrayList<T> stackRef;
private int maxLen;
public Stack2() { // A constructor
stackRef = new ArrayList<T> ();
maxLen = 99;
}
public void push(T newValue) {
if (stackRef.size() == maxLen)
System.out.println("Error in push–stack is full");
else
stackRef.add(newValue);
}
public void pop() {
if (empty())
System.out.println("Error in pop–stack is empty");
else
stackRef.remove(stackRef.size() - 1);
}
public T top() {
if empty()) {
System.out.println("Error in top–stack is empty");
return null; }
else
return (stackRef.get(stackRef.size() - 1));
}
public boolean empty() {return (stackRef.isEmpty());}
import java.util.*;
public class Stack2<T> {
private ArrayList<T> stackRef;
private int maxLen;
public Stack2() { // A constructor
stackRef = new ArrayList<T> ();
maxLen = 99;
}
public void push(T newValue) {
if (stackRef.size() == maxLen)
System.out.println("Error in push–stack is full");
else
stackRef.add(newValue);
}
public void pop() {
if (empty())
System.out.println("Error in pop–stack is empty");
else
stackRef.remove(stackRef.size() - 1);
}
public T top() {
if empty()) {
System.out.println("Error in top–stack is empty");
return null; }
else
return (stackRef.get(stackRef.size() - 1));
}
public boolean empty() {return (stackRef.isEmpty());}
可以使用String以下类型实例化此类:
This class could be instantiated for the String type with the following:
Stack2<String> myStack = new Stack2<String>();Stack2<String> myStack = new Stack2<String>();
回想一下第9章 ,Java 5.0 支持通配符类。例如,Collection<?>是所有集合类的通配符类。这允许编写一个可以接受任何集合类型作为参数的方法。因为集合本身可以是泛型的,所以该类Collection<?>在某种意义上是泛型类的泛型。
Recall from Chapter 9, that Java 5.0 supports wildcard classes. For example, Collection<?> is a wildcard class for all collection classes. This allows a method to be written that can accept any collection type as a parameter. Because a collection can itself be generic, the Collection<?> class is in a sense a generic of a generic class.
对于通配符类型的对象,必须小心谨慎。例如,由于此类型的特定对象的组件具有类型,因此无法将其他类型的对象添加到集合中。例如,考虑
Some care must be taken with objects of the wildcard type. For example, because the components of a particular object of this type have a type, other type objects cannot be added to the collection. For example, consider
Collection<?> c = new ArrayList<String>();Collection<?> c = new ArrayList<String>();
add使用方法来把某些东西放入这个集合是非法的,除非它的类型是String。
It would be illegal to use the add method to put something into this collection unless its type were String.
在 Java 5.0 中,可以轻松定义一个泛型类,该类仅适用于一组受限类型。例如,一个类可以声明一个泛型类型的变量,并compareTo通过该变量调用方法。如果为不包含方法的类型实例化该类compareTo,则无法使用该类。为了防止为不支持的类型实例化泛型类compareTo,可以使用以下泛型参数定义它:
A generic class can easily be defined in Java 5.0 that will work only for a restricted set of types. For example, a class can declare a variable of the generic type and call a method such as compareTo through that variable. If the class is instantiated for a type that does not include a compareTo method, the class cannot be used. To prevent a generic class from being instantiated for a type that does not support compareTo, it could be defined with the following generic parameter:
<T extends Comparable><T extends Comparable>
Comparable是在其中声明的接口compareTo。如果在类定义中使用此泛型类型,则无法为任何未实现的类型实例化该类Comparable。保留字的选择extends在这里似乎很奇怪,但它的使用与子类型的概念有关。显然,Java 的设计者不想在语言中添加另一个更具内涵的保留字。
Comparable is the interface in which compareTo is declared. If this generic type is used on a class definition, the class cannot be instantiated for any type that does not implement Comparable. The choice of the reserved word extends seems odd here, but its use is related to the concept of a subtype. Apparently, the designers of Java did not want to add another more connotative reserved word to the language.
与 Java 的情况一样,C# 的第一个版本定义了用于存储任何类的对象的集合类。这些是ArrayList、Stack和Queue。这些类与 Java 5.0 之前的集合类存在同样的问题。
As was the case with Java, the first version of C# defined collection classes that stored objects of any class. These were ArrayList, Stack, and Queue. These classes had the same problems as the collection classes of pre-Java 5.0.
2005 版 C# 中加入了泛型类。五个预定义的泛型集合是Array、、、和(该类实现哈希)。与 Java 5.0 完全一样,这些类消除了允许集合中List存在混合类型以及从集合中移除对象时需要强制转换的问题。StackQueueDictionaryDictionary
Generic classes were added to C# in its 2005 version. The five predefined generic collections are Array, List, Stack, Queue, and Dictionary (the Dictionary class implements hashes). Exactly as in Java 5.0, these classes eliminate the problems of allowing mixed types in collections and requiring casts when objects are removed from the collections.
与 Java 5.0 一样,用户可以在 C# 2005 中定义泛型类。用户定义的 C# 泛型集合的一项功能是,可以定义其中任何一个,以允许对其元素进行索引(通过下标访问)。虽然索引通常是整数,但另一种方法是使用字符串作为索引。
As with Java 5.0, users can define generic classes in C# 2005. One capability of the user-defined C# generic collections is that any of them can be defined to allow its elements to be indexed (accessed through subscripting). Although the indexes are usually integers, an alternative is to use strings as indexes.
Java 5.0 提供而 C# 2005 没有的一项功能是通配符类。
One capability that Java 5.0 provides that C# 2005 does not is wildcard classes.
本章的前五节讨论了抽象数据类型,这是最小的封装。本节描述了大型程序所需的多类型封装。
The first five sections of this chapter discussed abstract data types, which are minimal encapsulations. This section describes the multiple-type encapsulations that are needed for larger programs.
当程序的大小超过几千行时,就会出现两个实际问题。从程序员的角度来看,让这样的程序看起来是子程序或抽象数据类型定义的单个集合,并不能给程序带来足够的组织水平,使其在智力上易于管理。大型程序的第二个实际问题是重新编译。对于相对较小的程序,每次修改后重新编译整个程序的成本并不高。但对于大型程序,重新编译的成本很高。因此,显然需要找到避免重新编译不受更改影响的程序部分的方法。这两个问题的明显解决方案是将程序组织成逻辑相关的代码和数据的集合,每个集合都可以在不重新编译程序其余部分的情况下进行编译。封装就是这样一个集合。
When the size of a program reaches beyond a few thousand lines, two practical problems become evident. From the programmer’s point of view, having such a program appear as a single collection of subprograms or abstract data type definitions does not impose an adequate level of organization on the program to keep it intellectually manageable. The second practical problem for larger programs is recompilation. For relatively small programs, recompiling the whole program after each modification is not costly. But for large programs, the cost of recompilation is significant. So, there is an obvious need to find ways to avoid recompilation of the parts of a program that are not affected by a change. The obvious solution to both of these problems is to organize programs into collections of logically related code and data, each of which can be compiled without recompilation of the rest of the program. An encapsulation is such a collection.
封装通常放在库中,可供除编写封装的程序之外的其他程序重复使用。至少在过去 50 年里,人们一直在编写包含数千行代码的程序,因此提供封装的技术已经发展了一段时间。
Encapsulations are often placed in libraries and made available for reuse in programs other than those for which they were written. People have been writing programs with more than a few thousand lines for at least the last 50 years, so techniques for providing encapsulations have been evolving for some time.
在允许嵌套子程序的语言中,可以通过将子程序定义嵌套在使用它们的逻辑上更大的子程序中来组织程序。这可以在 Python 和 Ruby 中完成。但是,如第5章 所述,这种使用静态作用域的组织程序的方法远非理想。因此,即使在允许嵌套子程序的语言中,它们也不用作主要的组织封装构造。
In languages that allow nested subprograms, programs can be organized by nesting subprogram definitions inside the logically larger subprograms that use them. This can be done in Python and Ruby. As discussed in Chapter 5, however, this method of organizing programs, which uses static scoping, is far from ideal. Therefore, even in languages that allow nested subprograms, they are not used as a primary organizing encapsulation construct.
C 不提供对抽象数据类型的完整支持,尽管可以模拟抽象数据类型和多类型封装。
C does not provide complete support for abstract data types, although both abstract data types and multiple-type encapsulations can be simulated.
在 C 中,可以将一组相关函数和数据定义放在一个文件中,该文件可以单独编译。这样的文件充当库,具有其实体的实现。此类文件的接口(包括数据、类型和函数声明)放在一个单独的文件中,称为头文件。可以通过在头文件中将类型表示声明为指向结构类型的指针来隐藏它们。此类结构类型的完整定义只需出现在实现文件中。
In C, a collection of related functions and data definitions can be placed in a file, which can be independently compiled. Such a file, which acts as a library, has an implementation of its entities. The interface to such a file, including data, type, and function declarations, is placed in a separate file called a header file. Type representations can be hidden by declaring them in the header file as pointers to struct types. The complete definitions of such struct types need only appear in the implementation file.
头文件(源代码形式)和实现文件的编译版本将提供给客户端。使用此类库时,将使用预处理器#include规范将头文件包含在客户端代码中,以便可以对客户端代码中对函数和数据的引用进行类型检查。该#include规范还记录了客户端程序依赖于库实现文件的事实。这种方法有效地将封装的规范和实现分开。
The header file, in source form, and the compiled version of the implementation file are furnished to clients. When such a library is used, the header file is included in the client code, using an #include preprocessor specification, so that references to functions and data in the client code can be type checked. The #include specification also documents the fact that the client program depends on the library implementation file. This approach effectively separates the specification and implementation of an encapsulation.
虽然这些封装有效,但它们也带来了一些不安全性。例如,用户可以简单地将头文件中的定义剪切并粘贴到客户端程序中,而不是使用#include。这种方法是可行的,因为#include只需将其操作数文件的内容复制到出现 的文件中#include即可。但是,这种方法有两个问题。首先,客户端程序对库(及其头文件)的依赖关系的文档会丢失。其次,假设用户将头文件复制到他或她的程序中。然后库的作者更改了头文件和实现文件。此后,用户将新的实现文件与旧头文件一起使用。例如,变量可能在旧头文件中x定义为类型,客户端代码仍使用该类型,尽管实现代码已使用新头文件重新编译,新头文件将其定义为。因此,实现代码是使用 编译的,但客户端代码是使用 编译的。链接器不会检测到此错误。intxfloatxintxfloat
Although these encapsulations work, they create some insecurities. For example, a user could simply cut and paste the definitions from the header file into the client program, rather than using #include. This would work, because #include simply copies the contents of its operand file into the file in which the #include appears. However, there are two problems with this approach. First, the documentation of the dependence of the client program on the library (and its header file) is lost. Second, suppose a user copies the header file into his or her program. Then the author of the library changes both the header and the implementation files. Following this, the user uses the new implementation file with the old header. For example, a variable x could have been defined to be int type in the old header file, which the client code still uses, although the implementation code has been recompiled with the new header file, which defines x to be float. So, the implementation code was compiled with x as an int but the client code was compiled with x as a float. The linker does not detect this error.
因此,用户有责任确保头文件和实现文件都是最新的。这通常使用make实用程序来完成。
Thus, it is the user’s responsibility to ensure that both the header and implementation files are up-to-date. This is often done with a make utility.
这种方法的另一个缺点是指针的固有问题以及指针的赋值和比较的潜在混淆。
One other drawback of this approach is the inherent problems of pointers and the potential confusion with assignment and comparisons of pointers.
C++ 提供两种不同的封装:可以像在 C 中一样定义头文件和实现文件,也可以定义类头文件和定义。由于 C++ 模板和单独编译的复杂相互作用,C++ 模板库的头文件通常包含资源的完整定义,而不仅仅是数据声明和子程序协议;这在一定程度上归因于 C++ 程序使用了 C 链接器。
C++ provides two different kinds of encapsulation: header and implementation files can be defined as in C, or class headers and definitions can be defined. Because of the complex interplay of C++ templates and separate compilation, the header files of C++ template libraries often include complete definitions of resources, rather than just data declarations and subprogram protocols; this is due in part to the use of the C linker for C++ programs.
当使用非模板类进行封装时,类头文件仅包含成员函数的原型,而函数定义在类外部的代码文件中提供,如第11.4.1.4节 中的最后一个例子所示。这清楚地将接口与实现分开。
When nontemplated classes are used for encapsulations, the class header file has only the prototypes of the member functions, with the function definitions provided outside the class in a code file, as in the last example in Section 11.4.1.4. This clearly separates interface from implementation.
存在类但没有通用封装结构会导致一个语言设计问题,即有时当使用两个不同类的对象定义操作时,该操作自然不属于任何一个类。例如,假设我们有一个用于矩阵的抽象数据类型和一个用于向量的抽象数据类型,并且需要在向量和矩阵之间进行乘法运算。乘法代码必须能够访问向量和矩阵类的数据成员,但这两个类都不是代码的自然归属。此外,无论选择哪个类,访问另一个类的成员都是一个问题。在 C++ 中,可以通过允许非成员函数成为类的“朋友”来处理这种情况。朋友函数可以访问被声明为朋友的类的私有实体。对于矩阵/向量乘法运算,一个 C++ 解决方案是在矩阵和向量类之外定义该运算,但将其定义为两者的朋友。以下骨架代码说明了这种情况:
One language design problem that results from having classes but no generalized encapsulation construct is that sometimes when operations are defined that use two different classes of objects, the operation does not naturally belong in either class. For example, suppose we have an abstract data type for matrices and one for vectors and need a multiplication operation between a vector and a matrix. The multiplication code must have access to the data members of both the vector and the matrix classes, but neither of those classes is the natural home for the code. Furthermore, regardless of which is chosen, access to the members of the other is a problem. In C++, these kinds of situations can be handled by allowing nonmember functions to be “friends” of a class. Friend functions have access to the private entities of the class where they are declared to be friends. For the matrix/vector multiplication operation, one C++ solution is to define the operation outside both the matrix and the vector classes but define it to be a friend of both. The following skeletal code illustrates this scenario:
class Matrix; //** A class declaration
class Vector {
friend Vector multiply(const Matrix&, const Vector&);
. . .
};
class Matrix { //** The class definition
friend Vector multiply(const Matrix&, const Vector&);
. . .
};
//** The function that uses both Matrix and Vector objects
Vector multiply(const Matrix& m1, const Vector& v1) {
. . .
}
class Matrix; //** A class declaration
class Vector {
friend Vector multiply(const Matrix&, const Vector&);
. . .
};
class Matrix { //** The class definition
friend Vector multiply(const Matrix&, const Vector&);
. . .
};
//** The function that uses both Matrix and Vector objects
Vector multiply(const Matrix& m1, const Vector& v1) {
. . .
}
除了函数之外,可以将整个类定义为某个类的朋友;然后该类的所有私有成员对于朋友类的所有成员都是可见的。
In addition to functions, whole classes can be defined to be friends of a class; then all the private members of the class are visible to all of the members of the friend class.
C# 包含比类更大的封装构造。构造是所有 .NET 编程语言都使用的一个构造:程序集。程序集由 .NET 编译器构建。.NET 应用程序由一个或多个程序集组成。程序集是文件4,在应用程序看来,它就像一个单个动态链接库 ( .dll) 5或可执行文件 ( .exe)。程序集定义一个模块,可以单独开发。程序集包含几个不同的组件。程序集的主要组件之一是它的编程代码,该代码以中间语言编写,并从源语言编译而来。在 .NET 中,中间语言称为通用中间语言 (CIL)。所有 .NET 语言都使用它。由于其代码采用 CIL 编写,因此程序集可以在任何体系结构、设备或操作系统上使用。执行时,CIL 会即时编译为其所在体系结构的本机代码。
C# includes a larger encapsulation construct than a class. The construct is the one used by all of the .NET programming languages: the assembly. Assemblies are built by .NET compilers. A .NET application consists of one or more assemblies. An assembly is a file4 that appears to application programs to be a single dynamic link library (.dll)5 or an executable (.exe). An assembly defines a module, which can be separately developed. An assembly includes several different components. One of the primary components of an assembly is its programming code, which is in an intermediate language, having been compiled from its source language. In .NET, the intermediate language is named Common Intermediate Language (CIL). It is used by all .NET languages. Because its code is in CIL, an assembly can be used on any architecture, device, or operating system. When executed, the CIL is just-in-time compiled to native code for the architecture on which it is resident.
除了 CIL 代码之外,.NET 程序集还包含描述其定义的每个类以及其使用的所有外部类的元数据。程序集还包括程序集中引用的所有程序集的列表以及程序集版本号。
In addition to the CIL code, a .NET assembly includes metadata that describes every class it defines, as well as all external classes it uses. An assembly also includes a list of all assemblies referenced in the assembly and an assembly version number.
在 .NET 世界中,程序集是软件部署的基本单位。程序集可以是私有的,在这种情况下它们只供一个应用程序使用,也可以是公共的,这意味着任何应用程序都可以使用它们。
In the .NET world, the assembly is the basic unit of deployment of software. Assemblies can be private, in which case they are available to just one application, or public, which means any application can use them.
如前所述,C# 有一个访问修饰符,internal。internal类的成员对于它所在的程序集中的所有类都是可见的。
As mentioned previously, C# has an access modifier, internal. An internal member of a class is visible to all classes in the assembly in which it appears.
Java 的文件结构类似于称为 Java 档案 (JAR) 的程序集。它也用于部署 Java 软件系统。JAR 是使用 Java 实用程序jar而不是编译器构建的。
Java has a file structure that is similar to an assembly called a Java Archive (JAR). It is also used for deployment of Java software systems. JARs are built with the Java utility jar, rather than a compiler.
我们已经将封装视为逻辑相关软件资源(特别是抽象数据类型)的语法容器。这些封装的目的是提供一种将程序组织成逻辑单元进行编译的方法。这允许在单独更改后重新编译程序的各个部分。还有另一种封装对于构建大型程序必不可少:命名封装。
We have considered encapsulations to be syntactic containers for logically related software resources—in particular, abstract data types. The purpose of these encapsulations is to provide a way to organize programs into logical units for compilation. This allows parts of programs to be recompiled after isolated changes. There is another kind of encapsulation that is necessary for constructing large programs: a naming encapsulation.
大型程序通常由许多开发人员编写,他们工作方式有些独立,甚至可能身处不同的地理位置。这就要求程序的逻辑单元必须独立,但又必须能够协同工作。这也带来了一个命名问题:独立工作的开发人员如何才能为他们的变量、方法和类创建名称,而不会意外使用其他程序员在开发同一软件系统的不同部分时使用的名称?
A large program is usually written by many developers, working somewhat independently, perhaps even in different geographic locations. This requires the logical units of the program to be independent, while still able to work together. It also creates a naming problem: How can independently working developers create names for their variables, methods, and classes without accidentally using names already in use by some other programmer developing a different part of the same software system?
库是同类命名问题的根源。在过去二十年中,大型软件系统越来越依赖于支持软件的库。几乎所有用现代编程语言编写的软件都需要使用大型而复杂的标准库,以及特定于应用程序的库。多个库的广泛使用使得管理名称的新机制成为必要。例如,当开发人员向现有库添加新名称或创建新库时,他或她不得使用与客户端应用程序或程序使用的其他库中已定义的名称相冲突的新名称。如果没有语言处理器的帮助,这几乎是不可能的,因为库作者没有方便的方法来了解客户端程序使用的名称或客户端程序可能使用的其他库定义了哪些名称。
Libraries are the origin of the same kind of naming problems. Over the past two decades, large software systems have become progressively more dependent on libraries of supporting software. Nearly all software written in contemporary programming languages requires the use of large and complex standard libraries, in addition to application-specific libraries. This widespread use of multiple libraries has necessitated new mechanisms for managing names. For example, when a developer adds new names to an existing library or creates a new library, he or she must not use a new name that conflicts with a name already defined in a client’s application program or in some other library the program uses. Without some language processor assistance, this is virtually impossible, because there is no convenient way for the library author to know what names a client’s program uses or what names are defined by the other libraries the client program might use.
命名封装定义了有助于避免这些名称冲突的名称范围。每个库都可以创建自己的命名封装,以防止其名称与其他库或客户端代码中定义的名称发生冲突。软件系统的每个逻辑部分都可以创建具有相同目的的命名封装。
Naming encapsulations define name scopes that assist in avoiding these name conflicts. Each library can create its own naming encapsulation to prevent its names from conflicting with the names defined in other libraries or in client code. Each logical part of a software system can create a naming encapsulation with the same purpose.
命名封装是逻辑封装,因为它们不需要物理上连续。可以将多个不同的代码集合放在同一个命名空间中,即使它们存储在不同的地方。在以下部分中,我们将简要介绍命名封装在 C++、Java 和 Ruby 中的用法。
Naming encapsulations are logical encapsulations, in the sense that they need not be physically contiguous. Several different collections of code can be placed in the same namespace, even though they are stored in different places. In the following sections, we briefly describe the uses of naming encapsulations in C++, Java, and Ruby.
C++ 包含一个规范,namespace可帮助程序管理全局命名空间问题。可以将每个库放在其自己的命名空间中,并在名称在命名空间之外使用时使用该命名空间的名称来限定程序中的名称。例如,假设有一个实现堆栈的抽象数据类型头文件。如果担心其他某个库文件可能会定义在堆栈抽象数据类型中使用的名称,则可以将定义堆栈的文件放在其自己的命名空间中。这可以通过将堆栈的所有声明放在命名空间块中来实现,如下所示:
C++ includes a specification, namespace, that helps programs manage the problem of global namespaces. One can place each library in its own namespace and qualify the names in the program with the name of the namespace when the names are used outside that namespace. For example, suppose there is an abstract data type header file that implements stacks. If there is concern that some other library file may define a name that is used in the stack abstract data type, the file that defines the stack could be placed in its own namespace. This is done by placing all of the declarations for the stack in a namespace block, as in the following:
namespace myStackSpace {
// Stack declarations
}
namespace myStackSpace {
// Stack declarations
}
堆栈抽象数据类型的实现文件可以使用范围解析运算符引用头文件中声明的名称,::如下所示
The implementation file for the stack abstract data type could reference the names declared in the header file with the scope resolution operator, ::, as in
myStackSpace::topSubmyStackSpace::topSub
实现文件也可以出现在与头文件上使用的命名空间块规范相同的命名空间块规范中,这将使所有内容头文件中声明的名称直接可见。这肯定更简单,但可读性稍差,因为在实现文件中声明特定名称的位置不太明显。
The implementation file could also appear in a namespace block specification identical to the one used on the header file, which would make all of the names declared in the header file directly visible. This is definitely simpler, but slightly less readable, because it is less obvious where a specific name in the implementation file is declared.
客户端代码可以通过三种不同的方式访问库头文件命名空间中的名称。一种方法是使用命名空间的名称限定库中的名称。例如,对变量的引用topSub可以显示如下:
Client code can gain access to the names in the namespace of the header file of a library in three different ways. One way is to qualify the names from the library with the name of the namespace. For example, a reference to the variable topSub could appear as follows:
myStackSpace::topSubmyStackSpace::topSub
如果实现文件不在同一个命名空间中,那么实现代码就可以引用它。
This is exactly the way the implementation code could reference it if the implementation file was not in the same namespace.
另外两种方法使用using指令。该指令可用于限定命名空间中的单个名称,例如
The other two approaches use the using directive. This directive can be used to qualify individual names from a namespace, as with
using myStackSpace::topSub;using myStackSpace::topSub;
这使得命名空间topSub中的任何其他名称可见,但其他名称不可见myStackSpace。
which makes topSub visible, but not any other names from the myStackSpace namespace.
该using指令还可用于限定命名空间中的所有名称,如下所示:
The using directive also can be used to qualify all of the names from a namespace, as in the following:
using namespace myStackSpace;using namespace myStackSpace;
包含此指令的代码可以直接访问命名空间中定义的名称,例如
Code that includes this directive can directly access the names defined in the namespace, as in
p = topSub;p = topSub;
请注意,命名空间是 C++ 的一个复杂特性,我们在这里仅介绍了其中最简单的部分。
Be aware that namespaces are a complicated feature of C++, and we have introduced only the simplest part of the story here.
C# 包含的命名空间与 C++ 的命名空间非常相似。
C# includes namespaces that are much like those of C++.
Java 包含一个命名封装构造:包。包可以包含多个类型6定义,并且包中的类型是彼此的部分朋友。此处的部分意味着包中类型中定义的实体(无论是公共的还是受保护的(参见第12章 )的)或没有访问说明符)对包中的所有其他类型都是可见的。
Java includes a naming encapsulation construct: the package. Packages can contain more than one type6 definition, and the types in a package are partial friends of one another. Partial here means that the entities defined in a type in a package that either are public or protected (see Chapter 12) or have no access specifier are visible to all other types in the package.
没有访问修饰符的实体被称为具有包作用域,因为它们在整个包中都是可见的。因此,Java 不需要显式友元声明,不包括 C++ 的友元函数或友元类。
Entities without access modifiers are said to have package scope, because they are visible throughout the package. Java therefore has less need for explicit friend declarations and does not include the friend functions or friend classes of C++.
文件中定义的资源被指定在具有包声明的特定包中,例如
The resources defined in a file are specified to be in a particular package with a package declaration, as in
package stkpkg;package stkpkg;
包声明必须出现在文件的第一行。每个不包含包声明的文件的资源都隐式地放置在同一个未命名的包中。
The package declaration must appear as the first line of the file. The resources of every file that does not include a package declaration are implicitly placed in the same unnamed package.
包的客户端可以使用完全限定名称引用包中定义的类型。例如,如果包中stkpkg有一个名为 的类myStack,则该类可以在 的客户端中引用为stkpkg。同样,名为 的对象stkpkg.myStack中的变量可以引用为。由于这种方法在包嵌套时很快就会变得繁琐,因此 Java 提供了声明,它允许更短地引用包中定义的类型名称。例如,假设客户端包括以下内容:myStacktopSubstkpkg.myStack.topSubimport
The clients of a package can reference the types defined in the package using fully qualified names. For example, if the package stkpkg has a class named myStack, that class can be referenced in a client of stkpkg as stkpkg.myStack. Likewise, a variable in the myStack object named topSub could be referenced as stkpkg.myStack.topSub. Because this approach can quickly become cumbersome when packages are nested, Java provides the import declaration, which allows shorter references to type names defined in a package. For example, suppose the client includes the following:
import stkpkg.myStack;import stkpkg.myStack;
现在,只需通过类名即可引用该类myStack。为了能够访问包中的所有类型名称,可以在 import 语句中使用星号代替类型名称。例如,如果我们想导入 中的所有类型stkpkg,我们可以使用以下命令:
Now, the class myStack can be referenced by just its name. To be able to access all of the type names in the package, an asterisk can be used on the import statement in place of the type name. For example, if we wanted to import all of the types in stkpkg, we could use the following:
import stkpkg.*;import stkpkg.*;
请注意,Java 的import只是一种缩写机制。 不会使用 来提供任何隐藏的外部资源import。事实上,在 Java 中,如果编译器或类加载器(使用包名和CLASSPATH环境变量)可以找到任何内容,则不会隐式隐藏任何内容。
Note that Java’s import is only an abbreviation mechanism. No otherwise hidden external resources are made available with import. In fact, in Java nothing is implicitly hidden if it can be found by the compiler or class loader (using the package name and the CLASSPATH environment variable).
Javaimport记录了它所在的包与 中指定的包之间的依赖关系。当不使用import时,这些依赖关系不太明显。import
Java’s import documents the dependencies of the package in which it appears on the packages named in the import. These dependencies are less obvious when import is not used.
Ruby 类充当命名空间封装,其他支持面向对象编程的语言的类也是如此。Ruby 有一个额外的命名封装,称为模块。模块通常定义方法和常量的集合。因此,模块可以方便地封装相关方法和常量的库,这些库的名称位于单独的命名空间中,因此不会与使用该模块的程序中的其他名称发生名称冲突。模块与类不同,因为它们不能实例化或子类化,也不定义变量。模块中定义的方法包括模块名称。例如,考虑以下骨架模块定义:
Ruby classes serve as namespace encapsulations, as do the classes of other languages that support object-oriented programming. Ruby has an additional naming encapsulation, called a module. Modules typically define collections of methods and constants. So, modules are convenient for encapsulating libraries of related methods and constants, whose names are in a separate namespace so there are no name conflicts with other names in a program that uses the module. Modules are unlike classes in that they cannot be instantiated or subclassed and do not define variables. Methods that are defined in a module include the module’s name in their names. For example, consider the following skeletal module definition:
module MyStuff
PI = 3.14159265
def MyStuff.mymethod1(p1)
. . .
end
def MyStuff.mymethod2(p2)
. . .
end
end
module MyStuff
PI = 3.14159265
def MyStuff.mymethod1(p1)
. . .
end
def MyStuff.mymethod2(p2)
. . .
end
end
假设MyStuff模块存储在其自己的文件中,想要使用 的常量和方法的程序MyStuff必须首先获得对模块的访问权限。这是通过 方法完成的require,该方法以字符串文字的形式将文件名作为参数。然后,可以通过模块的名称访问模块的常量和方法。考虑以下使用我们的示例模块 的代码,MyStuff该模块存储在名为 的文件中myStuffMod:
Assuming the MyStuff module is stored in its own file, a program that wants to use the constant and methods of MyStuff must first gain access to the module. This is done with the require method, which takes the file name in the form of a string literal as a parameter. Then, the constants and methods of the module can be accessed through the module’s name. Consider the following code that uses our example module, MyStuff, which is stored in the file named myStuffMod:
require 'myStuffMod'
. . .
MyStuff.mymethod1(x)
. . .
require 'myStuffMod'
. . .
MyStuff.mymethod1(x)
. . .
Modules are further discussed in Chapter 12.
抽象数据类型的概念及其在程序设计中的应用是编程作为一门工程学科发展过程中的一个里程碑。尽管这一概念相对简单,但直到语言设计出支持它时,它的使用才变得方便和安全。
The concept of abstract data types, and their use in program design, was a milestone in the development of programming as an engineering discipline. Although the concept is relatively simple, its use did not become convenient and safe until languages were designed to support it.
抽象数据类型的两个主要特征是将数据对象与其相关操作打包以及隐藏信息。语言可以直接支持抽象数据类型,也可以使用更通用的封装来模拟它们。
The two primary features of abstract data types are the packaging of data objects with their associated operations and information hiding. A language may support abstract data types directly or simulate them with more general encapsulations.
C++ 数据抽象由类提供。类是类型,实例可以是堆栈或堆动态的。成员函数(方法)的完整定义可以出现在类中,也可以只在类中给出协议,而定义放在另一个文件中,可以单独编译。C++ 类可以有两个子句,每个子句都带有访问修饰符前缀:private 或 public。类定义中可以给出构造函数和析构函数。堆分配的对象必须使用 明确释放delete。
C++ data abstraction is provided by classes. Classes are types, and instances can be either stack or heap dynamic. A member function (method) can have its complete definition appear in the class or have only the protocol given in the class and the definition placed in another file, which can be separately compiled. C++ classes can have two clauses, each prefixed with an access modifier: private or public. Both constructors and destructors can be given in class definitions. Heap-allocated objects must be explicitly deallocated with delete.
Java 数据抽象与 C++ 类似,不同之处在于所有 Java 对象都是从堆中分配并通过引用变量访问的。此外,所有对象都会被垃圾回收。在 Java 中,修饰符出现在各个声明(或定义)上,而不是将访问修饰符附加到子句上。
Java data abstractions are similar to those of C++, except all Java objects are allocated from the heap and are accessed through reference variables. Also, all objects are garbage collected. Rather than having access modifiers attached to clauses, in Java the modifiers appear on individual declarations (or definitions).
C# 支持类和结构体的抽象数据类型。其结构体是值类型,不支持继承。C# 类与 Java 的类类似。
C# supports abstract data types with both classes and structs. Its structs are value types and do not support inheritance. C# classes are similar to those of Java.
Ruby 通过其类支持抽象数据类型。Ruby 的类与大多数其他语言的类不同,因为它们是动态的 — 可以在执行期间添加、删除或更改成员。
Ruby supports abstract data types with its classes. Ruby’s classes differ from those of most other languages in that they are dynamic—members can be added, deleted, or changed during execution.
C++、Java 5.0 和 C# 2005 允许对其抽象数据类型进行参数化 - Ada 通过其泛型包进行参数化,C++ 通过其模板类进行参数化,而 Java 5.0 和 C# 通过其集合类和接口以及用户定义的泛型类进行参数化。
C++, Java 5.0, and C# 2005 allow their abstract data types to be parameterized—Ada through its generic packages, C++ through its templated classes, and Java 5.0 and C# through their collection classes and interfaces and user-defined generic classes.
为了支持大型程序的构建,一些现代语言包括多类型封装结构,这些结构可以包含逻辑相关类型的集合。封装还可以提供对其实体的访问控制。封装为程序员提供了一种组织程序的方法,同时也方便了重新编译。
To support the construction of large programs, some contemporary languages include multiple-type encapsulation constructs, which can contain a collection of logically related types. An encapsulation may also provide access control to its entities. Encapsulations provide the programmer with a method of organizing programs that also facilitates recompilation.
C++、C#、Java 和 Ruby 提供命名封装。对于 Ada 和 Java,它们是命名包;对于 C++ 和 C#,它们是命名空间;对于 Ruby,它们是模块。部分由于包的可用性,Java 没有友元函数或友元类。
C++, C#, Java, and Ruby provide naming encapsulations. For Ada and Java, they are named packages; for C++ and C#, they are namespaces; for Ruby, they are modules. Partially because of the availability of packages, Java does not have friend functions or friend classes.
编程语言中有哪两种抽象?
What are the two kinds of abstractions in programming languages?
定义抽象数据类型。
Define abstract data type.
抽象数据类型定义的两个部分有什么优点?
What are the advantages of the two parts of the definition of abstract data type?
支持抽象数据类型的语言的语言设计要求是什么?
What are the language design requirements for a language that supports abstract data types?
抽象数据类型的语言设计问题是什么?
What are the language design issues for abstract data types?
C++ 对象从哪里分配?
From where are C++ objects allocated?
C++ 成员函数的定义可以出现在哪些不同的地方?
In what different places can the definition of a C++ member function appear?
C++ 构造函数的用途是什么?
What is the purpose of a C++ constructor?
构造函数的合法返回类型有哪些?
What are the legal return types of a constructor?
所有 Java 方法都在哪里定义?
Where are all Java methods defined?
C++ 类实例是如何创建的?
How are C++ class instances created?
Java 类实例从哪里分配?
From where are Java class instances allocated?
为什么 Java 没有析构函数?
Why does Java not have destructors?
所有 Java 方法都在哪里定义?
Where are all Java methods defined?
Java 类分配在哪里?
Where are Java classes allocated?
为什么 Java 中析构函数不像 C++ 中那么常用?
Why are destructors not as frequently needed in Java as they are in C++?
什么是友元函数? 什么是友元类?
What is a friend function? What is a friend class?
Java 没有友元函数或友元类的原因是什么?
What is one reason Java does not have friend functions or friend classes?
描述 C# 结构和其类之间的根本区别。
Describe the fundamental differences between C# structs and its classes.
C# 中的结构对象是如何创建的?
How is a struct object in C# created?
解释私有类型的访问器比公共类型的访问器更好的三个原因。
Explain the three reasons accessors to private types are better than making the types public.
C++ 结构和 C# 结构之间有什么区别?
What are the differences between a C++ struct and a C# struct?
所有 Ruby 构造函数的名称是什么?
What is the name of all Ruby constructors?
Ruby 的类与 C++ 和 Java 的类的根本区别是什么?
What is the fundamental difference between the classes of Ruby and those of C++ and Java?
C++ 模板类的实例是如何创建的?
How are instances of C++ template classes created?
描述在大型程序构建中出现的导致封装结构发展的两个问题。
Describe the two problems that appear in the construction of large programs that led to the development of encapsulation constructs.
使用 C 定义抽象数据类型会出现什么问题?
What problems can occur using C to define abstract data types?
什么是 C++ 命名空间,其用途是什么?
What is a C++ namespace, and what is its purpose?
什么是 Java 包,其用途是什么?
What is a Java package, and what is its purpose?
描述.NET 程序集。
Describe a .NET assembly.
Ruby 模块中可以出现哪些元素?
What elements can appear in a Ruby module?
一些软件工程师认为所有导入的实体都应该用导出程序单元的名称来限定。你同意吗?支持你的答案。
Some software engineers believe that all imported entities should be qualified by the name of the exporting program unit. Do you agree? Support your answer.
假设有人设计了一个堆栈抽象数据类型,其中函数top返回访问路径(或指针)而不是返回顶部元素的副本。这不是真正的数据抽象。为什么?请举一个例子来说明这个问题。
Suppose someone designed a stack abstract data type in which the function top returned an access path (or pointer) rather than returning a copy of the top element. This is not a true data abstraction. Why? Give an example that illustrates the problem.
写一篇分析Java包和C++命名空间的异同的文章。
Write an analysis of the similarities of and differences between Java packages and C++ namespaces.
讨论 C# 属性相对于用 C++ 或 Java 编写访问器方法的优势。
Discuss the advantages of C# properties, relative to writing accessor methods in C++ or Java.
解释 C 封装方法的危险。
Explain the dangers of C’s approach to encapsulation.
为什么 C++ 没有消除问题5 中讨论的问题?
Why didn’t C++ eliminate the problems discussed in Problem 5?
为什么析构函数在 Java 中很少使用,但在 C++ 中却必不可少?
Why are destructors rarely used in Java but essential in C++?
支持和反对 C++ 方法内联策略的论点是什么?
What are the arguments for and against the C++ policy on inlining of methods?
描述 C# 结构优于 C# 类的情况。
Describe a situation where a C# struct is preferable to a C# class.
解释为什么命名封装对于开发大型程序很重要。
Explain why naming encapsulations are important for developing large programs.
描述客户端在 C++ 中引用命名空间中名称的三种方式。
Describe the three ways a client can reference a name from a namespace in C++.
C# 标准库的命名空间System并非隐式地提供给 C# 程序。你认为这是一个好主意吗?捍卫你的答案。
The namespace of the C# standard library, System, is not implicitly available to C# programs. Do you think this is a good idea? Defend your answer.
Ruby 中改变对象的能力有哪些优点和缺点?
What are the advantages and disadvantages of the ability to change objects in Ruby?
将 Java 的包与 Ruby 的模块进行比较。
Compare Java’s packages with Ruby’s modules.
用你熟悉的语言设计一个具有整数元素的矩阵的抽象数据类型,包括加法、减法和矩阵乘法的运算。
Design an abstract data type for a matrix with integer elements in a language that you know, including operations for addition, subtraction, and matrix multiplication.
用你所熟悉的语言设计一个浮点元素的队列抽象数据类型,包括入队、出队和清空操作。出队操作删除元素并返回其值。
Design a queue abstract data type for float elements in a language that you know, including operations for enqueue, dequeue, and empty. The dequeue operation removes the element and returns its value.
Modify the C++ class for the abstract stack type shown in Section 11.4.1 to use a linked list representation and test it with the same code that appears in this chapter.
Modify the Java class for the abstract stack type shown in Section 11.4.2 to use a linked list representation and test it with the same code that appears in this chapter.
为复数编写一个抽象数据类型,包括加法、减法、乘法、除法、提取复数各部分的运算以及从两个浮点常量、变量或表达式构造复数。使用 C++、Java、C# 或 Ruby。
Write an abstract data type for complex numbers, including operations for addition, subtraction, multiplication, division, extraction of each of the parts of a complex number, and construction of a complex number from two floating-point constants, variables, or expressions. Use C++, Java, C#, or Ruby.
为队列编写一个抽象数据类型,其元素存储 10 个字符的名称。队列元素必须从堆中动态分配。队列操作包括入队、出队和清空。使用 C++、Java、C# 或 Ruby。
Write an abstract data type for queues whose elements store 10-character names. The queue elements must be dynamically allocated from the heap. Queue operations are enqueue, dequeue, and empty. Use either C++, Java, C#, or Ruby.
为队列编写一个抽象数据类型,其元素可以是任何原始类型。使用 Java 5.0、C# 2005 或 C++。
Write an abstract data type for a queue whose elements can be any primitive type. Use Java 5.0, C# 2005, or C++.
为队列编写一个抽象数据类型,其元素包括 20 个字符的字符串和整数优先级。此队列必须具有以下方法:入队,以字符串和整数作为参数;出队,返回队列中优先级最高的字符串;清空。队列不会按照其元素的优先级顺序进行维护,因此出队操作必须始终搜索整个队列。
Write an abstract data type for a queue whose elements include both a 20-character string and an integer priority. This queue must have the following methods: enqueue, which takes a string and an integer as parameters; dequeue, which returns the string from the queue that has the highest priority; and empty. The queue is not to be maintained in priority order of its elements, so the dequeue operation must always search the whole queue.
双端队列是一种双端队列,操作可以从任一端添加和删除元素。修改编程练习 7的解决方案以实现双端队列。
A deque is a double-ended queue, with operations adding and removing elements from either end. Modify the solution to Programming Exercise 7 to implement a deque.
为有理数(一个分子和一个分母)编写一个抽象数据类型。包括构造函数和获取分子、获取分母、加法、减法、乘法、除法、相等性测试和显示的方法。使用 Java、C#、C++ 或 Ruby。
Write an abstract data type for rational numbers (a numerator and a denominator). Include a constructor and methods for getting the numerator, getting the denominator, addition, subtraction, multiplication, division, equality testing, and display. Use Java, C#, C++, or Ruby.
本章首先简要介绍面向对象编程,然后详细讨论继承和动态绑定的主要设计问题。接下来,讨论 Smalltalk、C++、Java、C# 和 Ruby 对面向对象编程的支持。接下来简要概述了面向对象语言中方法调用到方法的动态绑定的实现。最后一节讨论反射。
This chapter begins with a brief introduction to object-oriented programming, followed by an extended discussion of the primary design issues for inheritance and dynamic binding. Next, the support for object-oriented programming in Smalltalk, C++, Java, C#, and Ruby is discussed. Following this is a short overview of the implementation of dynamic bindings of method calls to methods in object-oriented languages. The last section discusses reflection.
支持面向对象编程的语言如今已牢牢占据了主流。从 COBOL 到 LISP,包括两者之间的几乎所有语言,都出现了支持面向对象编程的方言。除了面向对象编程之外,C++ 还支持过程式和面向数据编程。CLOS 是 LISP 的面向对象版本 (Paepeke, 1993),也支持函数式编程。一些旨在支持面向对象编程的较新语言不支持其他编程范式,但仍采用一些基本的命令式结构,并具有旧命令式语言的外观。其中包括 Java 和 C#。Ruby 很难分类:它是一种纯面向对象语言,因为所有数据都是对象,但它是一种混合语言,因为人们可以将其用于过程式编程。最后,还有一种纯面向对象但有些不寻常的语言:Smalltalk。Smalltalk 是第一个提供面向对象编程完整支持的语言。不同语言对面向对象编程的支持细节差异很大,这是本章的主要主题。
Languages that support object-oriented programming now are firmly entrenched in the mainstream. From COBOL to LISP, including virtually every language in between, dialects that support object-oriented programming have appeared. C++ supports procedural and data-oriented programming, in addition to object-oriented programming. CLOS, an object-oriented version of LISP (Paepeke, 1993), also supports functional programming. Some of the newer languages that were designed to support object-oriented programming do not support other programming paradigms but still employ some of the basic imperative structures and have the appearance of the older imperative languages. Among these are Java and C#. Ruby is challenging to categorize: It is a pure object-oriented language in the sense that all data are objects, but it is a hybrid language in that one can use it for procedural programming. Finally, there is the pure object-oriented but somewhat unconventional language: Smalltalk. Smalltalk was the first language to offer complete support for object-oriented programming. The details of support for object-oriented programming vary widely among languages, and that is the primary topic of this chapter.
本章主要依赖第 11章 。事实上,它是该章的延续。这种关系反映了面向对象编程本质上是抽象原则在抽象数据类型上的应用。具体而言,在面向对象编程中,相似抽象数据类型集合的共性被分解出来并放入新类型中。集合的成员从该新类型继承这些共同部分。这个特性就是继承,它是面向对象编程及其支持语言的核心。
This chapter relies heavily on Chapter 11. It is in fact a continuation of that chapter. This relationship reflects the reality that object-oriented programming is, in essence, an application of the principle of abstraction to abstract data types. Specifically, in object-oriented programming, the commonality of a collection of similar abstract data types is factored out and put in a new type. The members of the collection inherit these common parts from that new type. This feature is inheritance, which is at the center of object-oriented programming and the languages that support it.
本章还广泛讨论了面向对象编程的另一个特征,即方法调用与方法的动态绑定。
The other characterizing feature of object-oriented programming, dynamic binding of method calls to methods, is also extensively discussed in this chapter.
尽管某些函数式语言支持面向对象编程,例如 CLOS、OCaml 和 F#,但本章不讨论这些语言。
Although object-oriented programming is supported by some of the functional languages, for example, CLOS, OCaml, and F#, those languages are not discussed in this chapter.
面向对象编程的概念起源于 SIMULA 67,但直到 Smalltalk 演变为 Smalltalk 80(当然是在 1980 年)后才得到充分发展1。事实上,有些人认为 Smalltalk 是纯面向对象编程语言的基础模型。面向对象的语言必须支持三个关键语言特性:抽象数据类型、继承和方法调用与方法的动态绑定。第11章 详细讨论了抽象数据类型,因此本章重点介绍继承和动态绑定。
The concept of object-oriented programming had its roots in SIMULA 67 but was not fully developed1 until the evolution of Smalltalk resulted in Smalltalk 80 (in 1980, of course). Indeed, some consider Smalltalk to be the base model for a purely object-oriented programming language. A language that is object oriented must provide support for three key language features: abstract data types, inheritance, and dynamic binding of method calls to methods. Abstract data types were discussed in detail in Chapter 11, so this chapter focuses on inheritance and dynamic binding.
长期以来,软件开发人员一直面临着提高生产率的压力。计算机硬件成本的持续下降加剧了这种压力。到 20 世纪 80 年代中后期,许多软件开发人员意识到,他们职业中提高生产率最有希望的机会之一就是软件重用。抽象数据类型及其封装和访问控制显然是重用的候选对象。抽象数据类型重用的问题是,在几乎所有情况下,现有类型的特性和功能并不完全适合新用途。旧类型至少需要进行一些小的修改。这种修改可能很困难,因为它们要求进行修改的人了解现有代码的部分(如果不是全部的话)。在许多情况下,进行修改的人不是程序的原作者。此外,修改通常需要更改所有客户端程序。
There has long been pressure on software developers to increase their productivity. This pressure has been intensified by the continuing reduction in the cost of computer hardware. By the middle to late 1980s, it became apparent to many software developers that one of the most promising opportunities for increased productivity in their profession was in software reuse. Abstract data types, with their encapsulation and access controls, are obvious candidates for reuse. The problem with the reuse of abstract data types is that, in nearly all cases, the features and capabilities of the existing type are not quite right for the new use. The old type requires at least some minor modifications. Such modifications can be difficult, because they require the person doing the modification to understand part, if not all, of the existing code. In many cases, the person doing the modification is not the program’s original author. Furthermore, the modifications often require changes to all client programs.
使用抽象数据类型进行编程的第二个问题是,类型定义都是独立的,并且处于同一级别。2这种设计通常使得无法组织程序以匹配程序正在解决的问题空间。在许多情况下,底层问题具有相关的对象类别,既是兄弟(彼此相似),也是父子(具有后代关系)。
A second problem with programming with abstract data types is that the type definitions are all independent and are at the same level.2 This design often makes it impossible to organize a program to match the problem space being addressed by the program. In many cases, the underlying problem has categories of objects that are related, both as siblings (being similar to each other) and as parents and children (having a descendant relationship).
继承既可以解决抽象数据类型重用带来的修改问题,也可以解决程序组织问题。如果新的抽象数据类型可以继承某些现有类型的数据和功能,并且还可以修改其中的一些实体并添加新实体,那么重用就会变得非常容易,而无需更改重用的抽象数据类型。程序员可以从现有的抽象数据类型开始,设计修改后的后代以满足新问题要求。此外,继承为相关类的层次结构定义提供了一个框架,可以反映问题空间中的后代关系。
Inheritance offers a solution to both the modification problem posed by abstract data type reuse and the program organization problem. If a new abstract data type can inherit the data and functionality of some existing type, and is also allowed to modify some of those entities and add new entities, reuse is greatly facilitated without requiring changes to the reused abstract data type. Programmers can begin with an existing abstract data type and design a modified descendant of it to fit a new problem requirement. Furthermore, inheritance provides a framework for the definition of hierarchies of related classes that can reflect the descendant relationships in the problem space.
以 SIMULA 67 为代表的面向对象语言中的抽象数据类型通常称为类。与抽象数据类型的实例一样,类实例称为对象。通过从另一个类继承而定义的类是派生类、子类或子类。新类派生自的类是其基类、超类或父类。定义对类的对象进行操作的子程序称为方法。对方法的调用有时称为消息。类的整个方法集合称为该类的消息协议或消息接口。面向对象程序中的计算由从对象发送到其他对象(在某些情况下是发送到类)的消息指定。
The abstract data types in object-oriented languages, following the lead of SIMULA 67, are usually called classes. As with instances of abstract data types, class instances are called objects. A class that is defined through inheritance from another class is a derived class, a subclass, or a child class. A class from which the new class is derived is its base class, superclass, or parent class. The subprograms that define the operations on objects of a class are called methods. The calls to methods are sometimes called messages. The entire collection of methods of a class is called the message protocol, or message interface, of the class. Computations in an object-oriented program are specified by messages sent from objects to other objects, or in some cases, to classes.
方法类似于子程序。两者都是执行某些计算的代码集合。两者都可以接受参数并返回结果。
Methods are similar to subprograms. Both are collections of code that perform some computation. Both can take parameters and return results.
传递消息不同于调用子程序。子程序通常处理由其调用者作为参数传递给它的数据,或者非本地或全局访问的数据。发送给对象的消息是执行其方法之一的请求。该方法要操作的数据至少有一部分是对象本身的一部分。对象具有定义对象可以对其自身执行的过程的方法。由于对象属于抽象数据类型,因此这些应该是操作对象数据的唯一方法。子程序定义了它可以对发送给它的任何数据(或非本地或全局提供的数据)执行的过程。
Passing a message is different from calling a subprogram. A subprogram typically processes data that is either passed to it by its caller as a parameter or is accessed nonlocally or globally. A message that is sent to an object is a request to execute one of its methods. At least some of the data on which the method is to operate is part of the object itself. Objects have methods that define processes the object can perform on itself. Because the objects are of abstract data types, these should be the only ways to manipulate data of the object. A subprogram defines a process that it can perform on any data sent to it (or made available nonlocally or globally).
作为继承的一个简单示例,请考虑以下内容:假设我们有一个名为的类Vehicles,它具有年份、颜色和品牌变量。它的自然专业化或子类将是Truck,它可以从继承变量Vehicle,但会添加用于运输能力和车轮数量的变量。图 12.1Vehicle显示了一个简单的图表来指示类与类之间的关系Truck,其中箭头指向父类。
As a simple example of inheritance, consider the following: Suppose we have a class named Vehicles, which has variables for year, color, and make. A natural specialization, or subclass, of this would be Truck, which could inherit the variables from Vehicle, but would add variables for hauling capacity and number of wheels. Figure 12.1 shows a simple diagram to indicate the relationship between the Vehicle class and the Truck class, in which the arrow points to the parent class.
派生类与其父类有多种不同之处。3以下是父类与其子类之间最常见的区别:
There are several ways a derived class can differ from its parent.3 Following are the most common differences between a parent class and its subclasses:
子类可以向从父类继承的变量和/或方法添加变量和/或方法。
The subclass can add variables and/or methods to those inherited from the parent class.
子类可以修改其继承的一个或多个方法的行为。修改后的方法与被修改的方法具有相同的名称,并且通常具有相同的协议。
The subclass can modify the behavior of one or more of its inherited methods. A modified method has the same name, and often the same protocol, as the one of which it is a modification.
父类可以将其某些变量或方法定义为私有访问权限,这意味着它们在子类中不可见。
The parent class can define some of its variables or methods to have private access, which means they will not be visible in the subclass.
新方法被称为覆盖继承的方法,因此该方法被称为覆盖方法。覆盖方法的目的是在子类中提供与父类中的操作类似的操作,但针对子类的对象进行定制。例如,父类Bird可能有一个draw绘制普通鸟类的方法。named 的子类Bird可以Waterfowl覆盖draw继承自的方法来Bird绘制普通水禽,也许是鸭子。
The new method is said to override the inherited method, which is then called an overridden method. The purpose of an overriding method is to provide an operation in the subclass that is similar to one in the parent class, but is customized for objects of the subclass. For example, a parent class, Bird, might have a draw method that draws a generic bird. A subclass of Bird named Waterfowl could override the draw method inherited from Bird to draw a generic waterfowl, perhaps a duck.
类可以有两种方法和两种变量。最常用的方法和变量称为实例方法和实例变量。类的每个对象都有自己的一组实例变量,用于存储对象的状态。同一类的两个对象之间的唯一区别是它们的实例变量的状态。4例如,汽车类可能有颜色、品牌、型号和年份的实例变量。实例方法仅对类的对象进行操作。类变量属于类,而不是其对象,因此类只有一个副本。例如,如果我们想计算类的实例数,计数器不能是实例变量 - 它必须是类变量。类方法可以对类执行操作,也可能对类的对象执行操作。可以通过在其名称前加上类名或引用其实例之一的变量来调用它们。如果类定义了类方法,即使没有该类的实例也可以调用该方法。类方法可用于创建类的实例。
Classes can have two kinds of methods and two kinds of variables. The most commonly used methods and variables are called instance methods and instance variables. Every object of a class has its own set of instance variables, which store the object’s state. The only difference between two objects of the same class is the state of their instance variables.4 For example, a class for cars might have instance variables for color, make, model, and year. Instance methods operate only on the objects of the class. Class variables belong to the class, rather than their objects, so there is only one copy for the class. For example, if we wanted to count the number of instances of a class, the counter could not be an instance variable—it would need to be a class variable. Class methods can perform operations on the class, and possibly also on the objects of the class. They can be called by prefixing their names with either the class name or a variable that references one of their instances. If a class defines a class method, that method can be called even if there are no instances of the class. A class method could be used to create an instance of the class.
如果新类是单个父类的子类,则派生过程称为单继承。如果类有多个父类,则该过程称为多重继承。当多个类通过单继承相关联时,它们之间的关系可以在派生树中显示。多重继承中的类关系可以在派生图中显示。如第 12.4.2.2节中的图 12.5所示。
If a new class is a subclass of a single parent class, then the derivation process is called single inheritance. If a class has more than one parent class, the process is called multiple inheritance. When a number of classes are related through single inheritance, their relationships to each other can be shown in a derivation tree. The class relationships in a multiple inheritance can be shown in a derivation graph. This is shown in Figure 12.5 in Section 12.4.2.2.
继承作为增加重用可能性的一种手段的一个缺点是,它会在继承层次结构中的类之间建立依赖关系。这一结果与抽象数据类型的优点之一相悖,即它们彼此独立。当然,并非所有抽象数据类型都必须完全独立。但一般来说,抽象数据类型的独立性是其最强大的积极特征之一。然而,如果不在抽象数据类型之间建立依赖关系,就很难(甚至不可能)提高抽象数据类型的可重用性。此外,在许多情况下,依赖关系自然会反映底层问题空间中的依赖关系。
One disadvantage of inheritance as a means of increasing the possibility of reuse is that it creates dependencies among the classes in an inheritance hierarchy. This result works against one of the advantages of abstract data types, which is that they are independent of each other. Of course, not all abstract data types must be completely independent. But in general, the independence of abstract data types is one of their strongest positive characteristics. However, it may be difficult, if not impossible, to increase the reusability of abstract data types without creating dependencies among some of them. Furthermore, in many cases, the dependencies naturally mirror dependencies in the underlying problem space.
第11章 讨论了类中变量和方法(通常统称为成员)的访问控制。私有成员在类内部可见,而公共成员也对类的客户端可见。继承带来了一种新的可见性类别,即子类。基类的成员对子类不可见,但公共成员对子类可见。第三级可访问性,即受保护的可访问性,允许基类的成员对子类可见,但对客户端不可见。
In Chapter 11 the access controls for variables and methods, together often called members, in a class are discussed. Private members are visible inside the class, while public members also are visible to clients of the class. Inheritance brings a new category of possible visibility, subclasses. Private members of a base class are not visible to subclasses, but public members are. The third level of accessibility, protected, allows members of a base class to be visible to subclasses, but not clients.
面向对象编程语言的第三个基本特征(在抽象数据类型和继承之后)是一种多态性5,它由消息与方法定义的动态绑定提供。有时这称为动态分派。考虑以下情况:有一个基类 ,它定义了一个绘制与基类相关的某个图形的A方法。第二个类,被定义为 的子类。这个新类的对象也需要一个类似于 提供的方法,但略有不同,因为子类对象略有不同。因此,子类将覆盖继承的方法。如果和的客户端有一个引用类 对象的变量,则该引用也可以指向类的对象,使其成为多态引用。如果通过多态引用调用在两个类中定义的方法,则运行时系统必须在执行期间确定应调用哪个方法,是 还是(通过确定引用当前引用哪种类型的对象)。6图12.2显示了这种情况。drawBAdrawAdrawABABdrawAB
The third essential characteristic (after abstract data types and inheritance) of object-oriented programming languages is a kind of polymorphism5 provided by the dynamic binding of messages to method definitions. This is sometimes called dynamic dispatch. Consider the following situation: There is a base class, A, that defines a method draw that draws some figure associated with the base class. A second class, B, is defined as a subclass of A. Objects of this new class also need a draw method that is like that provided by A but a bit different because the subclass objects are slightly different. So, the subclass overrides the inherited draw method. If a client of A and B has a variable that is a reference to class A’s objects, that reference also could point at class B’s objects, making it a polymorphic reference. If the method draw, which is defined in both classes, is called through the polymorphic reference, the run-time system must determine, during execution, which method should be called, A’s or B’s (by determining which type object is currently referenced by the reference).6 Figure 12.2 shows this situation.
多态性是任何静态类型的面向对象语言的自然组成部分。从某种意义上说,多态性使静态类型语言有点动态类型,其中一点点是在方法调用到方法的某些绑定中。多态变量的类型确实是动态的。
Polymorphism is a natural part of any object-oriented language that is statically typed. In a sense, polymorphism makes a statically typed language a little bit dynamically typed, where the little bit is in some bindings of method calls to methods. The type of a polymorphic variable is indeed dynamic.
动态绑定的一个目的是让软件系统在开发和维护过程中更容易扩展。假设我们有一个二手车目录,它是作为Car每个目录都实现为一个类和一个子类目录中的汽车。子类包含汽车的图像和有关汽车的特定信息。用户可以使用程序浏览汽车,该程序在用户浏览时显示每辆汽车的图像和信息。每辆汽车(及其信息)的显示都包括一个按钮,如果用户对该特定汽车感兴趣,可以单击该按钮。用户浏览完目录后,系统将向用户打印有关感兴趣的汽车的图像和信息。实现此系统的一种方法是将对感兴趣的每辆汽车(的子类Car)的引用放在一个列表中,该列表可以存储对基类的引用。Car当用户准备好时,可以打印有关所有感兴趣的汽车的信息,供用户研究和比较列表中的汽车。汽车目录当然会经常更改。这将需要对的子类进行相应的更改Car。但是,对子类集合的更改不会要求对系统进行任何其他更改。
One purpose of dynamic binding is to allow software systems to be more easily extended during both development and maintenance. Suppose we have a catalog of used cars that is implemented as a Car class and a subclass for each car in the catalog. The subclasses contain an image of the car and specific information about the car. Users can browse the cars with a program that displays the images and information about each car as the user browses to it. The display of each car (and its information) includes a button that the user can click if he or she is interested in that particular car. After the user gets through the catalog, the system will print the images and information about the cars of interest to the user. One way to implement this system is to place a reference to each car (subclass of Car) of interest in a list that can store references to the base class, Car. When the user is ready, information about all of the cars of interest could be printed for the user to study and compare the cars in the list. The catalog of cars will of course change frequently. This will necessitate corresponding changes in the subclasses of Car. However, changes to the collection of subclasses will not require any other changes to the system.
在某些情况下,继承层次结构的设计会导致一个或多个类在层次结构中的位置过高,以致于实例化它们没有意义。例如,假设一个程序Building为特定类型的建筑定义了一个类和一组子类,如French_Gothic_Cathedrals。在中实现draw方法可能没有意义Building。但是因为它的所有后代类都应该有这样的方法,所以该方法的协议(而不是主体)包含在中Building。这样的方法通常称为抽象方法(在 C++ 中称为纯虚方法)。包含至少一个抽象方法的类称为抽象类(在 C++ 中称为抽象基类)。这样的类通常不能被实例化,因为它的某些方法虽然声明了但未定义(它们没有主体)。任何要实例化的抽象类的子类都必须提供所有继承的抽象方法的实现(定义)。
In some cases, the design of an inheritance hierarchy results in one or more classes that are so high in the hierarchy that an instantiation of them would not make sense. For example, suppose a program defined a Building class and a collection of subclasses for specific types of buildings, for instance, French_Gothic_Cathedrals. It probably would not make sense to have an implemented draw method in Building. But because all of its descendant classes should have such methods, the protocol (but not the body) of that method is included in Building. Such a method is often called an abstract method (pure virtual method in C++). A class that includes at least one abstract method is called an abstract class (abstract base class in C++). Such a class usually cannot be instantiated, because some of its methods are declared but are not defined (they do not have bodies). Any subclass of an abstract class that is to be instantiated must provide implementations (definitions) of all of the inherited abstract methods.
在设计支持继承和动态绑定的编程语言特性时,必须考虑许多问题。我们认为最重要的问题将在本节中讨论。
A number of issues must be considered when designing the programming language features to support inheritance and dynamic binding. Those that we consider most important are discussed in this section.
完全致力于计算对象模型的语言设计者会设计一个包含所有其他类型概念的对象系统。在这种思维方式下,从简单的标量整数到完整的软件系统,一切都是对象。这种选择的优点是语言及其使用的优雅和纯粹的统一性。主要缺点是简单的操作必须通过消息传递过程来完成,这通常会使它们比命令式模型中的类似操作更慢,在命令式模型中,单个机器指令可以实现这种简单的操作。在这种最纯粹的面向对象计算模型中,所有类型都是类。预定义类和用户定义类之间没有区别。事实上,所有类都以相同的方式处理,所有计算都是通过消息传递完成的。
A language designer who is totally committed to the object model of computation designs an object system that subsumes all other concepts of type. Everything, from a simple scalar integer to a complete software system, is an object in this mind-set. The advantage of this choice is the elegance and pure uniformity of the language and its use. The primary disadvantage is that simple operations must be done through the message-passing process, which often makes them slower than similar operations in an imperative model, where single machine instructions may implement such simple operations. In this purest model of object-oriented computation, all types are classes. There is no distinction between predefined and user-defined classes. In fact, all classes are treated the same way and all computation is accomplished through message passing.
在已添加面向对象编程支持的命令式语言中,有一种替代方法可以替代仅使用对象的做法:保留基本命令式语言中的完整类型集合,并添加对象类型模型。这种方法会导致语言变得更大,其类型结构可能会让该语言的新用户感到困惑。
One alternative to the exclusive use of objects that is common in imperative languages to which support for object-oriented programming has been added is the following: Retain the complete collection of types from the base imperative language and add the object typing model. This approach results in a larger language whose type structure can be confusing to new users of the language.
除了使用对象之外,另一种选择是为原始标量类型使用命令式类型结构,但将所有结构化类型实现为对象。这种选择使对原始值的操作速度与命令式模型中预期的速度相当。
Another alternative to the exclusive use of objects is to have an imperative-style type structure for the primitive scalar types, but implement all structured types as objects. This choice provides the speed of operations on primitive values that is comparable to those expected in the imperative model.
如果一种语言允许程序在任何情况下用类的变量替换其祖先类的变量,而不会导致类型错误,也不会改变程序的行为,则该语言支持替换原则。在这种语言中,如果类B派生自类A,则B具有一切A,并且类的对象的行为B在代替类的对象时A与类的对象的行为相同A。7当这是真的时,是的子类型。虽然作为其父类的子类型的子类必须公开其父类公开的所有成员,但子类可以具有父类中没有的成员,并且仍然是子类型。BA
If a language allows programs in which a variable of a class can be substituted for a variable of one of its ancestor classes in any situation, without causing type errors and without changing the behavior of the program, that language supports the principle of substitution. In such a language, if class B is derived from class A, then B has everything A has and the behavior of an object of class B, when used in place of an object of class A, is identical to that of an object of class A.7 When this is true, B is a subtype of A. Although a subclass that is a subtype of its parent class must expose all of the members that are exposed by its parent class, the subclass can have members that are not in the parent class and still be a subtype.
Ada 的子类型是预定义子类型的示例。例如,
The subtypes of Ada are examples of predefined subtypes. For example,
subtype Small_Int is Integer range -100..100;subtype Small_Int is Integer range -100..100;
类型的变量Small_Int具有变量的所有操作Integer,但只能存储 中可能的值的子集Integer。此外,每个Small_Int变量都可以在任何Integer可以使用变量的地方使用。也就是说,Small_Int从某种意义上说,每个变量都是一个Integer变量。
Variables of Small_Int type have all of the operations of Integer variables but can store only a subset of the values possible in Integer. Furthermore, every Small_Int variable can be used anywhere an Integer variable can be used. That is, every Small_Int variable is, in a sense, an Integer variable.
子类型的定义明确禁止父类中存在子类中不公开的公共实体。因此,子类型的派生过程必须要求父类的公共实体被继承为子类中的公共实体。
The definition of subtype clearly disallows having public entities in the parent class that are not public in the subclass. So, the derivation process for subtypes must require that public entities of the parent class are inherited as public entities in the subclass.
并非所有子类都是子类型,也并非所有子类型都是子类。例如,如果子类更改了其重写方法之一的行为,则它不能成为子类型。此外,如果某个类不是另一个类的子类,则可以通过定义相同的成员(就类型和类型而言)成为该类的子类型。行为。子类型继承接口和行为,而子类继承实现,主要是为了促进代码重用。
Not all subclasses are subtypes and not all subtypes are subclasses. For example, a subclass cannot be a subtype if it changes the behavior of one of its overriding methods. Also, a class that is not a subclass of another class can be a subtype of that class by defining the same members, in terms of both types and behavior. A subtype inherits interfaces and behavior, while a subclass inherits implementation, primarily to promote code reuse.
大多数支持面向对象编程的静态类型语言都被设计为子类是子类型,除非程序员特意设计一个行为不同于其父类的子类。
Most static-typed languages that support object-oriented programming are designed so that subclasses are subtypes, unless the programmer specifically designs a subclass that has behavior that differs from that of its parent class.
一个显而易见的问题是:子类是否是子类型的问题是理论问题还是实践问题?定义一个子类,其重写方法保留其相应重写方法的类型协议但不保留其效果,这种情况可能并不常见。因此,这不是一个常见的实际问题。但是,如果有一种相当简单的方法来强制执行,则要求所有子类都是子类型将使继承具有更合理的理论基础。
One obvious question is: Is the issue of whether subclasses are subtypes a theoretical or practical one? It is probably unusual to define a subclass whose overriding methods preserve the type protocols of their corresponding overridden methods but not their effects. So it is not a frequent practical issue. However, requiring all subclasses to be subtypes, if there were a reasonably simple way to enforce that, would place inheritance on a sounder theoretical base.
面向对象语言的另一个简单的设计问题是:该语言是否允许多重继承(除了单继承之外)?也许不是那么简单。多重继承的目的是允许新类从两个或多个类继承。
Another simple design issue for object-oriented languages is: Does the language allow multiple inheritance (in addition to single inheritance)? Maybe it’s not so simple. The purpose of multiple inheritance is to allow a new class to inherit from two or more classes.
因为多重继承有时非常有用,为什么语言设计者不把它包括进来呢?原因有两点:复杂性和效率。额外的复杂性由几个问题说明。首先,请注意,如果一个类有两个不相关的父类,并且它们都没有定义在另一个中定义的名称,那么就没有问题。但是,假设名为的子类C从类A和类继承B,并且A和B都定义了一个名为的可继承方法display。如果C需要引用的两个版本display,该怎么做?当两个父类都定义相同名称的方法,并且必须在子类中重写其中一个或两个时,这种歧义问题会变得更加复杂。
Because multiple inheritance is sometimes highly useful, why would a language designer not include it? The reasons lie in two categories: complexity and efficiency. The additional complexity is illustrated by several problems. First, note that if a class has two unrelated parent classes and neither defines a name that is defined in the other, there is no problem. However, suppose a subclass named C inherits from both class A and class B and both A and B define an inheritable method named display. If C needs to reference both versions of display, how can that be done? This ambiguity problem is further complicated when the two parent classes both define identically named methods and one or both of them must be overridden in the subclass.
A如果和都B从共同的父类派生,Z并且C同时具有A和作为父类,则会出现另一个问题B。这种情况称为菱形继承或共享继承。在这种情况下,和都A应该B包含Z的可继承变量。假设Z包含一个名为的可继承变量sum。问题是C应该继承的两个版本sum还是只继承一个版本,如果只继承一个,继承哪一个?在某些情况下,编程时可能只继承其中一个,而在另一些情况下,则应该继承两者。当和都从中继承一个方法并且都重写该方法时,也会出现类似的问题A。B如果继承了两个重写方法Z的的客户端调用该方法,则应该调用哪个方法,或者两者都应该调用。菱形继承如图12.3C所示。
Another issue arises if both A and B are derived from a common parent, Z, and C has both A and B as parent classes. This situation is called diamond or shared inheritance. In this case, both A and B should include Z’s inheritable variables. Suppose Z includes an inheritable variable named sum. The question is whether C should inherit both versions of sum or just one, and if just one, which one? There may be programming situations in which just one of the two should be inherited, and others in which both should be inherited. A similar problem occurs when both A and B inherit a method from Z and both override that method. If a client of C, which inherits both overriding methods, calls the method, which method is called, or are both supposed to be called. Diamond inheritance is shown in Figure 12.3.
效率问题可能更多是人们所感知的,而不是实际存在的。例如,在 C++ 中,支持多重继承只需要对每个动态绑定方法调用进行一次额外的数组访问和一次额外的加法操作,至少在某些机器架构中是如此(Stroustrup,1994,第 270 页)。尽管即使程序不使用多重继承,也需要进行此操作,但这只是一笔小小的额外成本。
The question of efficiency may be more perceived than real. In C++, for example, supporting multiple inheritance requires just one additional array access and one extra addition operation for each dynamically bound method call, at least with some machine architectures (Stroustrup, 1994, p. 270). Although this operation is required even if the program does not use multiple inheritance, it is a small additional cost.
使用多重继承很容易导致程序组织复杂。许多尝试使用多重继承的人发现,设计用作多重父类的类很困难。而且困难不仅限于最初开发人员创建的困难。某个类可能在以后被另一个开发人员用作新类的父类之一。使用多重继承的系统的维护可能是一个更严重的问题,因为多重继承会导致类之间更复杂的依赖关系。有些人并不清楚多重继承的好处是否值得为设计和维护使用它的系统付出额外的努力。
The use of multiple inheritance can easily lead to complex program organizations. Many who have attempted to use multiple inheritance have found that designing the classes to be used as multiple parents is difficult. And the difficulties are not restricted to those created by the initial developer. A class might be used by another developer at some later date as one of the parents of a new class. Maintenance of systems that use multiple inheritance can be a more serious problem, for multiple inheritance leads to more complex dependencies among classes. It is not clear to some that the benefits of multiple inheritance are worth the added effort to design and maintain a system that uses it.
接口与抽象类有些相似;其方法已声明但未定义。接口无法实例化。它们被用作多重继承的替代方法。8接口提供了多重继承的一些优点,但缺点较少。例如,当使用接口而不是多重继承时,可以避免菱形继承的问题。
An interface is somewhat similar to an abstract class; its methods are declared but not defined. Interfaces cannot be instantiated. They are used as an alternative to multiple inheritance.8 Interfaces provide some of the benefits of multiple inheritance but have fewer disadvantages. For example, the problems of diamond inheritance are avoided when interfaces, rather than multiple inheritance, are used.
关于对象的分配和释放,有两个设计问题。第一个是对象分配的位置。如果它们的行为类似于抽象数据类型,则可以从任何地方分配它们。这意味着它们可以从运行时堆栈分配,也可以使用运算符或函数在堆上显式创建,例如new。如果它们都是堆动态的,则具有通过指针或引用变量进行统一创建和访问的方法的优势。这种设计简化了对象的赋值操作,使其在所有情况下都仅改变指针或引用值。它还允许隐式取消引用对对象的引用,从而简化访问语法。
There are two design questions concerning the allocation and deallocation of objects. The first of these is the place from which objects are allocated. If they behave like the abstract data types, then they can be allocated from anywhere. This means they could be allocated from the run-time stack or explicitly created on the heap with an operator or function, such as new. If they are all heap dynamic, there is the advantage of having a uniform method of creation and access through pointer or reference variables. This design simplifies the assignment operation for objects, making it in all cases only a pointer or reference value change. It also allows references to objects to be implicitly dereferenced, simplifying the access syntax.
如果对象是堆栈动态的,则在子类型方面存在潜在问题。如果类B是类的子类A并且B是的子类型A,则可以将类型的对象B分配给A类型的变量。例如,如果b1是类型的变量B并且a1是类型的变量A,那么
If objects are stack dynamic, there is a potential problem with regard to subtypes. If class B is a child of class A and B is a subtype of A, then an object of B type can be assigned to a variable of A type. For example, if b1 is a variable of B type and a1 is a variable of A type, then
a1 = b1;a1 = b1;
是合法语句。如果a1和b1是对堆动态对象的引用,则没有问题——赋值是简单的指针赋值。但是,如果a1和b1是堆栈动态的,那么它们是值变量,并且,如果分配了对象的值必须复制到目标对象的空间中。如果B将数据字段添加到其继承自的内容中A,则a1堆栈上将没有足够的空间容纳所有b1。多余的部分将被截断,这可能会让编写或使用代码的程序员感到困惑。这种截断称为对象切片。以下示例和图 12.4说明了这个问题。
is a legal statement. If a1 and b1 are references to heap-dynamic objects, there is no problem—the assignment is a simple pointer assignment. However, if a1 and b1 are stack dynamic, then they are value variables and, if assigned the value of the object, must be copied to the space of the target object. If B adds a data field to what it inherited from A, then a1 will not have sufficient space on the stack for all of b1. The excess will simply be truncated, which could be confusing to programmers who write or use the code. This truncation is called object slicing. The following example and Figure 12.4 illustrate the problem.
class A {
int x;
. . .
};
class B : A {
int y;
. . .
}
class A {
int x;
. . .
};
class B : A {
int y;
. . .
}
这里的第二个问题与从堆中分配对象的情况有关。问题是释放是隐式的、显式的还是两者兼而有之。如果释放是隐式的,则需要某种隐式的存储回收方法。如果释放可以是显式的,那就提出了是否可以创建悬空指针或引用的问题。
The second question here is concerned with those cases where objects are allocated from the heap. The question is whether deallocation is implicit, explicit, or both. If deallocation is implicit, some implicit method of storage reclamation is required. If deallocation can be explicit, that raises the issue of whether dangling pointers or references can be created.
如第12.2.3节 所述,消息与方法的动态绑定是面向对象编程的重要组成部分。这里的问题是,所有消息与方法的绑定是否都是动态的。另一种方法是允许用户指定特定绑定是动态的还是静态的。这样做的好处是静态绑定速度更快。那么,如果绑定不需要是动态的,为什么要付出代价呢?
As discussed in Section 12.2.3, dynamic binding of messages to methods is an essential part of object-oriented programming. The question here is whether all bindings of messages to methods are dynamic. The alternative is to allow the user to specify whether a specific binding is to be dynamic or static. The advantage of this is that static bindings are faster. So, if a binding need not be dynamic, why pay the price?
嵌套类定义的主要动机之一是信息隐藏。如果只有一个类需要新类,则没有理由将其定义成其他类可以看到它。在这种情况下,可以嵌套新类在使用它的类中。在某些情况下,新类嵌套在子程序中,而不是直接嵌套在另一个类中。
One of the primary motivations for nesting class definitions is information hiding. If a new class is needed by only one class, there is no reason to define it so it can be seen by other classes. In this situation, the new class can be nested inside the class that uses it. In some cases, the new class is nested inside a subprogram, rather than directly in another class.
新类所嵌套的类称为嵌套类。与类嵌套相关的最明显的设计问题与可见性有关。具体来说,一个问题是:嵌套类的哪些成员在嵌套类中是可见的?另一个重要问题则相反:嵌套类的哪些成员在嵌套类中是可见的?
The class in which the new class is nested is called the nesting class. The most obvious design issues associated with class nesting are related to visibility. Specifically, one issue is: Which of the members of the nesting class are visible in the nested class? The other important issue is the opposite: Which of the members of the nested class are visible in the nesting class?
初始化问题是指在创建对象时是否以及如何初始化对象。这比最初想象的要复杂得多。一个问题是对象是否必须手动初始化或通过某种隐式机制初始化。当创建子类的对象时,继承的父类成员的相关初始化是隐式的还是必须由程序员显式处理?
The initialization issue is whether and how objects are initialized to values when they are created. This is more complicated than may be first thought. One question is whether objects must be initialized manually or through some implicit mechanism. When an object of a subclass is created, is the associated initialization of the inherited parent class member implicit or must the programmer explicitly deal with it?
许多人认为 Smalltalk 是权威的面向对象编程语言。它是第一个完全支持该范式的语言。因此,很自然地,我们先从 Smalltalk 开始调查面向对象编程的语言支持。
Many think of Smalltalk as the definitive object-oriented programming language. It was the first language to include complete support for that paradigm. Therefore, it is natural to begin a survey of language support for object-oriented programming with Smalltalk.
在 Smalltalk 中,对象的概念是真正通用的。几乎所有东西,从简单的整数常量2到复杂的文件处理系统,都是对象。作为对象,它们被统一对待。它们都具有本地内存、固有处理能力、与其他对象通信的能力以及从祖先继承方法和实例变量的可能性。类不能在 Smalltalk 中嵌套。
In Smalltalk, the concept of an object is truly universal. Virtually everything, from things as simple as the integer constant 2 to a complex file-handling system, is an object. As objects, they are treated uniformly. They all have local memory, inherent processing ability, the capability to communicate with other objects, and the possibility of inheriting methods and instance variables from ancestors. Classes cannot be nested in Smalltalk.
所有计算都是通过消息进行的,即使是简单的算术运算也是如此。例如,表达式的x + 7实现是将+消息发送到x(以执行+方法),7作为参数发送。此操作返回一个带有加法结果的新数字对象。
All computation is through messages, even a simple arithmetic operation. For example, the expression x + 7 is implemented as sending the + message to x (to enact the + method), sending 7 as the parameter. This operation returns a new numeric object with the result of the addition.
消息的回复具有对象的形式,用于返回请求或计算的信息,或者仅用于确认请求的服务已完成。
Replies to messages have the form of objects and are used to return requested or computed information or only to confirm that the requested service has been completed.
所有 Smalltalk 对象都是从堆中分配的,并通过引用变量引用,这些引用变量是隐式取消引用的。没有显式的释放语句或操作。因此,所有释放都是隐式的,使用垃圾收集过程进行存储回收。
All Smalltalk objects are allocated from the heap and are referenced through reference variables, which are implicitly dereferenced. There is no explicit deallocation statement or operation. So, all deallocation is implicit, using a garbage collection process for storage reclamation.
在 Smalltalk 中,创建对象时必须显式调用构造函数。一个类可以有多个构造函数,但每个构造函数必须具有唯一的名称。
In Smalltalk, constructors must be explicitly called when an object is created. A class can have multiple constructors, but each must have a unique name.
Smalltalk 类不能嵌套在其他类中。
Smalltalk classes cannot be nested in other classes.
与 C++ 等混合语言不同,Smalltalk 仅针对一种软件开发范式(面向对象)而设计。此外,它不采用任何命令式语言的外观。其目的的纯粹性体现在其设计的简洁优雅和统一性上。
Unlike a hybrid language such as C++, Smalltalk was designed for just one software development paradigm—object oriented. Furthermore, it adopts none of the appearance of the imperative languages. Its purity of purpose is reflected in its simple elegance and uniformity of design.
There is an example Smalltalk program in Chapter 2.
Smalltalk 子类继承了其超类的所有成员。子类还可以拥有自己的实例变量,这些变量的名称必须与其祖先类中的变量名称不同。最后,子类可以定义新方法并重新定义祖先类中已存在的方法。当子类具有名称和协议与祖先类相同的方法时,子类方法会隐藏祖先类的方法。通过在消息前加上伪变量,可以访问此类隐藏方法。super前缀使方法搜索从超类开始,而不是本地开始。
A Smalltalk subclass inherits all of the members of its superclass. The subclass can also have its own instance variables, which must have names that are distinct from the variable names in its ancestor classes. Finally, the subclass can define new methods and redefine methods that already exist in an ancestor class. When a subclass has a method whose name and protocol are the same as an ancestor class, the subclass method hides that of the ancestor class. Access to such a hidden method is provided by prefixing the message with the pseudovariable super. The prefix causes the method search to begin in the superclass rather than locally.
因为父类中的成员不能对子类隐藏,所以子类可以是并且通常是子类型。
Because members in a parent class cannot be hidden from subclasses, subclasses can be and usually are subtypes.
Smalltalk 不支持多重继承。
Smalltalk does not support multiple inheritance.
Smalltalk 中消息与方法的动态绑定操作如下:发送给对象的消息会导致在该对象所属的类中搜索相应的方法。如果搜索失败,则继续在该类的超类中搜索,依此类推,直到Object没有超类的系统类。Object是类派生树的根,每个类都是该树上的一个节点。如果在该链中的任何地方都找不到方法,则会发生错误。重要的是要记住这种方法搜索是动态的 - 它在发送消息时进行。在任何情况下,Smalltalk 都不会将消息静态地绑定到方法。
The dynamic binding of messages to methods in Smalltalk operates as follows: A message to an object causes a search of the class to which the object belongs for a corresponding method. If the search fails, it is continued in the superclass of that class, and so forth, up to the system class, Object, which has no superclass. Object is the root of the class derivation tree on which every class is a node. If no method is found anywhere in that chain, an error occurs. It is important to remember that this method search is dynamic—it takes place when the message is sent. Smalltalk does not, under any circumstances, bind messages to methods statically.
Smalltalk 中唯一的类型检查是动态的,并且只有在将消息发送到没有匹配方法的对象(无论是本地还是通过继承)时才会发生类型错误。这与大多数其他语言的类型检查概念不同。Smalltalk 类型检查的简单目标是确保消息与某种方法匹配。
The only type checking in Smalltalk is dynamic, and the only type error occurs when a message is sent to an object that has no matching method, either locally or through inheritance. This is a different concept of type checking than that of most other languages. Smalltalk type checking has the simple goal of ensuring that a message matches some method.
Smalltalk 变量没有类型;任何名称都可以绑定到任何对象。直接的结果是,Smalltalk 支持动态多态性。所有 Smalltalk 代码都是通用的,因为变量的类型无关紧要,只要它们是一致的。对变量的操作(方法或运算符)的含义由变量当前绑定到的对象的类决定。
Smalltalk variables are not typed; any name can be bound to any object. As a direct result, Smalltalk supports dynamic polymorphism. All Smalltalk code is generic in the sense that the types of the variables are irrelevant, as long as they are consistent. The meaning of an operation (method or operator) on a variable is determined by the class of the object to which the variable is currently bound.
这次讨论的重点是,只要表达式中引用的对象具有用于表达式消息的方法,对象的类型就无关紧要。这意味着没有代码与特定类型绑定。
The point of this discussion is that as long as the objects referenced in an expression have methods for the messages of the expression, the types of the objects are irrelevant. This means that no code is tied to a particular type.
Smalltalk 是一种小型语言,尽管 Smalltalk 系统很大。该语言的语法简单且非常规则。如果一种语言是围绕一个简单但强大的概念构建的,那么它就是一个很好的例子,可以说明这种语言可以提供的强大功能。对于 Smalltalk 来说,这个概念是,所有编程都可以仅使用使用继承、对象和消息传递构建的类层次结构来完成。
Smalltalk is a small language, although the Smalltalk system is large. The syntax of the language is simple and highly regular. It is a good example of the power that can be provided by a small language if that language is built around a simple but powerful concept. In the case of Smalltalk, that concept is that all programming can be done employing only a class hierarchy built using inheritance, objects, and message passing.
与传统的编译命令式语言程序相比,等效的 Smalltalk 程序速度明显较慢。尽管在消息传递模型中提供数组索引和循环在理论上很有趣,但效率是评估编程语言的一个重要因素。因此,在大多数关于 Smalltalk 实际适用性的讨论中,效率显然是一个问题。
In comparison with conventional compiled imperative-language programs, equivalent Smalltalk programs are significantly slower. Although it is theoretically interesting that array indexing and loops can be provided within the message-passing model, efficiency is an important factor in the evaluation of programming languages. Therefore, efficiency will clearly be an issue in most discussions of the practical applicability of Smalltalk.
Smalltalk 的动态绑定允许类型错误在运行时才被发现。可以编写一个包含向不存在的方法发送消息的程序,并且直到发送消息后才会检测到,这会导致在开发后期比静态类型语言中发生更多的错误修复。然而,实际上类型错误并不是 Smalltalk 程序的严重问题。
Smalltalk’s dynamic binding allows type errors to go undetected until run time. A program can be written that includes messages to nonexistent methods and it will not be detected until the messages are sent, which causes a great deal more error repair later in the development than would occur in a static-typed language. However, in practice type errors are not a serious problem with Smalltalk programs.
总体而言,Smalltalk 的设计始终坚持语言优雅和严格遵守面向对象编程支持的原则,而常常不考虑实际问题,尤其是执行效率。这在对象和无类型变量的专有使用上最为明显。
Overall, the design of Smalltalk consistently came down on the side of language elegance and strict adherence to the principles of object-oriented programming support, often without regard for practical matters, in particular execution efficiency. This is most obvious in the exclusive use of objects and the typeless variables.
Smalltalk 用户界面对计算产生了重要的影响:窗口、鼠标指向设备以及弹出和下拉菜单的综合使用(这些都首次出现在 Smalltalk 中)主导着当代软件系统。
The Smalltalk user interface has had an important impact on computing: The integrated use of windows, mouse-pointing devices, and pop-up and pull-down menus, all of which first appeared in Smalltalk, dominate contemporary software systems.
Smalltalk 最大的影响可能就是面向对象编程的进步,它现在是使用最广泛的设计和编码方法。
Perhaps the greatest impact of Smalltalk is the advancement of object-oriented programming, now the most widely used design and coding methodology.
第2章介绍 C++ 如何从 C 和 SIMULA 67 发展而来,其设计目标是支持面向对象编程,同时保持与 C 几乎完全向后兼容。第11章讨论了用于支持抽象数据类型的 C++ 类。本节探讨了 C++ 对面向对象编程其他基本要素的支持。C++ 类、继承和动态绑定的整个细节集合庞大而复杂。本节仅讨论这些主题中最重要的部分,特别是与 第 12.3节 中描述的设计问题直接相关的部分。
Chapter 2 describes how C++ evolved from C and SIMULA 67, with the design goal of support for object-oriented programming while retaining nearly complete backward compatibility with C. C++ classes, as they are used to support abstract data types, are discussed in Chapter 11. C++ support for the other essentials of object-oriented programming is explored in this section. The whole collection of details of C++ classes, inheritance, and dynamic binding is large and complex. This section discusses only the most important among these topics, specifically, those directly related to the design issues described in Section 12.3.
C++ 是第一个广泛使用的面向对象编程语言,并且仍然是最受欢迎的语言之一。因此,自然而然地,它经常与其他语言进行比较。出于这两个原因,我们在这里对 C++ 的介绍比本章讨论的其他示例语言更详细。
C++ was the first widely used object-oriented programming language and is still among the most popular. So, naturally, it is the one with which other languages are often compared. For both of these reasons, our coverage of C++ here is more detailed than that of the other example languages discussed in this chapter.
为了保持与 C 的向后兼容性,C++ 保留了 C 的类型系统,并向其中添加了类。因此,C++ 既具有传统的命令式语言类型,又具有面向对象语言的类结构。它支持方法,以及与特定类无关的函数。这使它成为一种混合语言,既支持过程式编程,又支持面向对象编程。
To maintain backward compatibility with C, C++ retains the type system of C and adds classes to it. Therefore, C++ has both traditional imperative-language types and the class structure of an object-oriented language. It supports methods, as well as functions that are not related to specific classes. This makes it a hybrid language, supporting both procedural programming and object-oriented programming.
C++ 的对象可以是静态的、堆栈动态的或堆动态的。delete对于堆动态对象,需要使用运算符进行显式释放,因为 C++ 不包含隐式存储回收。
The objects of C++ can be static, stack dynamic, or heap dynamic. Explicit deallocation using the delete operator is required for heap-dynamic objects, because C++ does not include implicit storage reclamation.
许多类定义都包含一个析构函数方法,当类的对象不复存在时,该方法会被隐式调用。析构函数用于释放数据成员引用的堆分配内存。它还可用于记录对象在消亡前的部分或全部状态,通常用于调试目的。
Many class definitions include a destructor method, which is implicitly called when an object of the class ceases to exist. The destructor is used to deallocate heap-allocated memory that is referenced by data members. It may also be used to record part or all of the state of the object just before it dies, usually for debugging purposes.
Bjarne Stroustrup 是 C++ 的设计者和原始实现者,著有《C++ 之旅》、《编程:使用 C++ 的原则和实践》、《C++ 编程语言》、《C++ 的设计和演化》等多部著作。他的研究兴趣包括分布式系统、设计、编程技术、软件开发工具和编程语言。他积极参与 C++ 的 ANSI/ISO 标准化工作。Stroustrup 博士是纽约市摩根士丹利技术部门的董事总经理、哥伦比亚大学计算机科学客座教授和德克萨斯 A&M 大学计算机科学杰出研究教授。他是美国国家工程院院士、ACM 院士和 IEEE 院士。1993 年,Stroustrup 因“为 C++ 编程语言奠定基础的早期工作”而获得 ACM Grace Murray Hopper 奖。基于这些基础以及 Stroustrup 博士的持续努力,C++ 已成为计算历史上最具影响力的编程语言之一。”
Bjarne Stroustrup is the designer and original implementer of C++ and the author of A Tour of C++, Programming: Principles and Practice Using C++, The C++ Programming Language, The Design and Evolution of C++, and many other publications. His research interests include distributed systems, design, programming techniques, software development tools, and programming languages. He is actively involved in the ANSI/ISO standardization of C++. Dr. Stroustrup is a managing director in the technology division of Morgan Stanley in New York City, a Visiting Professor in Computer Science at Columbia University, and a Distinguished Research Professor in Computer Science at Texas A&M University. He is a member of the National Academy of Engineering, an ACM Fellow, and an IEEE Fellow. In 1993, Stroustrup received the ACM Grace Murray Hopper Award “for his early work laying the foundations for the C++ programming language. Based on the foundations and Dr. Stroustrup’s continuing efforts, C++ has become one of the most influential programming languages in the history of computing.”
(采访年份:2002 年)
(year of interview: 2002)
您对面向对象范式的看法:它的优点和缺点。首先,让我说说我对 OOP 的看法——太多人认为“面向对象”只是“好”的同义词。如果是这样,就不需要其他范式了。面向对象的关键是使用类层次结构,通过一些粗略等效的虚拟函数提供多态行为。对于正确的面向对象,重要的是避免直接访问此类层次结构中的数据,而只使用设计良好的功能接口。
Your thoughts on the object-oriented paradigm: Its pluses and minuses. Let me first say what I mean by OOP—too many people think that “object-oriented” is simply a synonym for “good.” If so, there would be no need for other paradigms. The key to OO is the use of class hierarchies providing polymorphic behavior through some rough equivalent of virtual functions. For proper OO, it is important to avoid directly accessing the data in such a hierarchy and to use only a well-designed functional interface.
除了众所周知的优点之外,面向对象编程也有明显的缺点。特别是,并不是每个概念都能自然地融入类层次结构,而且与其他方法相比,支持面向对象编程的机制可能会带来巨大的开销。对于许多简单的抽象,不依赖于层次结构和运行时绑定的类提供了一种更简单、更高效的替代方案。此外,在不需要运行时解析的情况下,依赖于(编译时)参数多态性的泛型编程是一种表现更好、更高效的方法。
In addition to its well-documented strengths, object-oriented programming also has obvious weaknesses. In particular, not every concept naturally fits into a class hierarchy, and the mechanisms supporting object-oriented programming can impose significant overheads compared to alternatives. For many simple abstractions, classes that do not rely on hierarchies and run-time binding provide a simpler and more efficient alternative. Furthermore, where no run-time resolution is needed, generic programming relying on (compile-time) parametric polymorphism is a better behaved and more efficient approach.
那么,C++:它是面向对象还是其他? C++ 支持多种范式(包括 OOP、泛型编程和过程式编程),这些范式的组合将多范式编程定义为支持多种编程风格(“范式”)以及这些风格的组合。
So, C++: Is it OO or other? C++ supports several paradigms—including OOP, generic programming, and procedural programming—and combinations of these paradigms define multiparadigm programming as supporting more than one programming style (“paradigm”) and combinations of those styles.
您有多范式编程的迷你示例吗?请考虑经典“形状集合”示例的这个变体(源自第一种支持面向对象编程的语言的早期:Simula 67):
Do you have a mini-example of multiparadigm programming? Consider this variant of the classic “collection of shapes” examples (originating from the early days of the first language to support object-oriented programming: Simula 67):
void draw_all(const vector<Shape*>& vs)
{
for (int i = 0; i<vs.size(); ++i)
vs[i]->draw();
}
void draw_all(const vector<Shape*>& vs)
{
for (int i = 0; i<vs.size(); ++i)
vs[i]->draw();
}
在这里,我将通用容器vector与多态类型一起使用Shape。vector提供静态类型安全性和最佳运行时性能。Shape处理(Shape即从 派生的类的任何对象Shape)无需重新编译。
Here, I use the generic container vector together with the polymorphic type Shape. The vector provides static type safety and optimal run-time performance. The Shape provides the ability to handle a Shape (i.e., any object of a class derived from Shape) without recompilation.
我们可以轻松地将其推广到任何满足 C++ 标准库要求的容器:
We can easily generalize this to any container that meets the C++ standard library requirements:
template<class C>
void draw_all(const C& c)
{
typedef typename C::
const_iterator CI;
for (CI p = c.begin();
p!=c.end(); ++p)
(*p)->draw();
}
template<class C>
void draw_all(const C& c)
{
typedef typename C::
const_iterator CI;
for (CI p = c.begin();
p!=c.end(); ++p)
(*p)->draw();
}
使用迭代器允许我们将其应用于draw_all ()不支持下标的容器,例如标准库列表:
Using iterators allows us to apply this draw_all () to containers that do not support subscripts, such as a standard library list:
vector<Shape*> vs;
list<Shape*> ls;
// . . .
draw_all(vs);
draw_all(ls);
vector<Shape*> vs;
list<Shape*> ls;
// . . .
draw_all(vs);
draw_all(ls);
我们甚至可以进一步概括它以处理由一对迭代器定义的任何元素序列:
We can even generalize this further to handle any sequence of elements defined by a pair of iterators:
template<class Iterator> void
draw_all(Iterator b, Iterator e)
{
for_each(b,e,mem_fun(&Shape::draw));
}
template<class Iterator> void
draw_all(Iterator b, Iterator e)
{
for_each(b,e,mem_fun(&Shape::draw));
}
为了简化实现,我使用了标准库算法for_each。
To simplify the implementation, I used the standard library algorithm for_each.
我们可以将此最后一个版本称为draw_all()标准库列表和数组:
We might call this last version of draw_all() for a standard library list and an array:
list<Shape*> ls;
Shape* as[100];
// . . .
draw_all(ls.begin(),ls.end());
draw_all(as,as+100);list<Shape*> ls;
Shape* as[100];
// . . .
draw_all(ls.begin(),ls.end());
draw_all(as,as+100);拥有多种范式的背景知识有多大用处?还是说花时间熟悉面向对象语言比学习其他范式更好? 对于任何想成为软件领域专业人士的人来说,了解多种语言和多种编程范式是必不可少的。目前,C++ 是多范式编程的最佳语言,也是学习各种形式编程的良好语言。但是,只了解 C++ 并不是一个好主意,更不用说只了解单一范式语言了。这有点像色盲或单一语言:你几乎不知道你错过了什么。优秀编程的灵感大多来自于学习和欣赏多种编程风格,并了解它们如何在不同的语言中使用。
How useful is it to have this background in numerous paradigms? Or would it be better to invest time in becoming even more familiar with OO languages rather than learning these other paradigms? It is essential for anyone who wants to be considered a professional in the areas of software to know several languages and several programming paradigms. Currently, C++ is the best language for multiparadigm programming and a good language for learning various forms of programming. However, it’s not a good idea to know just C++, let alone to know just a single-paradigm language. That would be a bit like being colorblind or monoglot: You would hardly know what you were missing. Much of the inspiration to good programming comes from having learned and appreciated several programming styles and seen how they can be used in different languages.
此外,我认为编写任何重要的程序都是需要受过扎实广泛教育的专业人员的工作,而不是那些只接受过仓促和狭隘“培训”的人的工作。
Furthermore, I consider programming of any nontrivial program a job for professionals with a solid and broad education, rather than for people with a hurried and narrow “training.”
C++ 类可以从现有类派生,该类是其父类或基类。与 Smalltalk 和大多数支持面向对象编程的其他语言不同,C++ 类也可以是独立的,没有超类。在派生类的定义中,派生类的名称是基类的名称加上冒号 (:),如以下语法形式:
A C++ class can be derived from an existing class, which is then its parent, or base, class. Unlike Smalltalk and most other languages that support object-oriented programming, a C++ class can also be stand-alone, without a superclass. In the definition of a derived class, the name of the derived class has the name of the base class attached with a colon (:), as in the following syntactic form:
class派生类名称:基类名称{...}
class derived_class_name : base_class_name { ... }
类定义中定义的数据称为该类的数据成员,类定义中定义的函数称为该类的成员函数(其他语言中的成员函数通常称为方法)。派生类可以继承基类的部分或全部成员,派生类也可以添加新成员和修改继承的成员函数。
The data defined in a class definition are called data members of that class, and the functions defined in a class definition are called member functions of that class (member functions in other languages are usually called methods). Some or all of the members of the base class may be inherited by the derived class, which can also add new members and modify inherited member functions.
所有 C++ 对象在使用前都必须初始化。因此,所有 C++ 类都至少包含一个构造函数方法,用于初始化新对象的数据成员。创建对象时会隐式调用构造函数方法。如果任何数据成员是指向堆分配数据的指针,则构造函数会分配该存储空间。
All C++ objects must be initialized before they are used. Therefore, all C++ classes include at least one constructor method that initializes the data members of the new object. Constructor methods are implicitly called when an object is created. If any of the data members are pointers to heap-allocated data, the constructor allocates that storage.
如果一个类是从另一个类派生的,则在创建派生类对象时必须初始化继承的数据成员。为此,将隐式调用基类构造函数。当必须向基类构造函数提供初始化数据时,将在对派生对象构造函数的调用中提供该数据。通常,这是通过以下构造完成的:
If a class is derived from another class, the inherited data members must be initialized when the derived class object is created. To do this, the base class constructor is implicitly called. When initialization data must be furnished to the base class constructor, it is given in the call to the derived object constructor. In general, this is done with the following construct:
子类(子类参数):基类(超类参数) {
subclass(subclass parameters): base_class(superclass parameters) {
...
...
}
}
如果开发人员在类定义中未包含构造函数,则编译器会包含一个简单的构造函数。如果存在基类,则此默认构造函数将调用基类的构造函数。
If no constructor is included by the developer in a class definition, the compiler includes a trivial constructor. This default constructor calls the constructor of the base class, if there is a base class.
类成员可以是私有的、受保护的或公共的。私有成员只能由类的成员函数和朋友访问。独立函数、成员函数和类可以声明为类的朋友,从而可以访问其私有成员。公共成员在任何地方都可见。受保护的成员类似于私有成员,但派生类除外,派生类的访问权限将在下文中介绍。派生类可以修改其继承成员的可访问性。派生类的完整语法形式如下:
Class members can be private, protected, or public. Private members are accessible only by member functions and friends of the class. Standalone functions, member functions, and classes can be declared to be friends of a class and thereby be given access to its private members. Public members are visible everywhere. Protected members are like private members, except in derived classes, whose access is described next. Derived classes can modify accessibility for their inherited members. The complete syntactic form of a derived class is as follows:
class派生类名:派生模式 基类名
{数据成员和成员函数声明};
class derived_class_name : derivation_mode base_class_name
{data member and member function declarations};
derivation_mode 可以是public或private。9(不要将公共和私有派生与公共和私有成员相混淆。)基类的公共成员和受保护成员在公共派生类中也分别是公共和受保护的。在私有派生类中,基类的公共成员和受保护成员都是私有的。因此,在类层次结构中,私有派生类切断了所有后继类对所有祖先类的所有成员的访问。基类的私有成员由派生类继承,但它们对该派生类的成员不可见,因此在那里没有用处。私有派生提供了这样一种可能性,即子类可以拥有与父类中相同成员具有不同访问权限的成员。请考虑以下示例:
The derivation_mode can be either public or private.9 (Do not confuse public and private derivation with public and private members.) The public and protected members of a base class are also public and protected, respectively, in a public-derived class. In a private-derived class, both the public and protected members of the base class are private. So, in a class hierarchy, a private-derived class cuts off access to all members of all ancestor classes to all successor classes. Private members of a base class are inherited by a derived class, but they are not visible to the members of that derived class and are therefore of no use there. Private derivations provide the possibility that a subclass can have members with different access than the same members in the parent class. Consider the following example:
class base_class {
private:
int a;
float x;
protected:
int b;
float y;
public:
int c;
float z;
};
class subclass_1 : public base_class {. . .};
class subclass_2 : private base_class {. . .};
class base_class {
private:
int a;
float x;
protected:
int b;
float y;
public:
int c;
float z;
};
class subclass_1 : public base_class {. . .};
class subclass_2 : private base_class {. . .};
在 中subclass_1,b和y是受保护的,并且c和z是公共的。在 中subclass_2,b,y,c和z是私有的。 的派生类不能subclass_2有可以访问 的任何成员的成员。中的base_class数据成员a和在或中都无法访问。xbase_classsubclass_1subclass_2
In subclass_1, b and y are protected, and c and z are public. In subclass_2, b, y, c, and z are private. No derived class of subclass_2 can have members with access to any member of base_class. The data members a and x in base_class are not accessible in either subclass_1 or subclass_2.
请注意,私有派生子类不能是子类型。例如,如果基类具有公共数据成员,则在私有派生下,该数据成员在子类中将是私有的。因此,如果子类的对象被替换为基类的对象,则在子类对象上对该数据成员的访问将是非法的。但是,公共派生子类可以并且通常是子类型。
Note that private-derived subclasses cannot be subtypes. For example, if the base class has a public data member, under private derivation that data member would be private in the subclass. Therefore, if an object of the subclass were substituted for an object of the base class, accesses to that data member would be illegal on the subclass object. However, public-derived subclasses can be and usually are subtypes.
在私有类派生下,父类的任何成员对于派生类的实例都不是隐式可见的。任何必须可见的成员都必须在派生类中重新导出。这种重新导出实际上免除了成员被隐藏的麻烦,即使派生是私有的。例如,考虑以下类定义:
Under private class derivation, no member of the parent class is implicitly visible to the instances of the derived class. Any member that must be made visible must be reexported in the derived class. This reexportation in effect exempts a member from being hidden even though the derivation was private. For example, consider the following class definition:
class subclass_3 : private base_class {
base_class :: c;
. . .
}
class subclass_3 : private base_class {
base_class :: c;
. . .
}
现在,的实例subclass_3可以访问c。就而言c,这就像派生是公共的一样。::此类定义中的双冒号()是范围解析运算符。它指定了其后续实体定义的类。
Now, instances of subclass_3 can access c. As far as c is concerned, it is as if the derivation had been public. The double colon (::) in this class definition is a scope resolution operator. It specifies the class where its following entity is defined.
以下段落中的示例说明了私有派生的目的和用途。
The example in the following paragraphs illustrates the purpose and use of private derivation.
考虑以下 C++ 继承的示例,其中定义了一个通用链表类,然后用于定义两个有用的子类:
Consider the following example of C++ inheritance, in which a general linked-list class is defined and then used to define two useful subclasses:
class single_linked_list {
private:
class node {
public:
node *link;
int contents;
};
node *head;
public:
single_linked_list() {head = 0};
void insert_at_head(int);
void insert_at_tail(int);
int remove_at_head();
int empty();
};
class single_linked_list {
private:
class node {
public:
node *link;
int contents;
};
node *head;
public:
single_linked_list() {head = 0};
void insert_at_head(int);
void insert_at_tail(int);
int remove_at_head();
int empty();
};
嵌套类node定义链表的一个单元由一个整数变量和一个指向node对象的指针组成。node该类位于 private 子句中,这会将其隐藏在所有其他类中。但是,它的成员是公共的,因此它们对嵌套类可见single_linked_list。如果它们是私有的,则需要将嵌套类声明为朋友,以使它们在嵌套类中可见。node请注意,嵌套类对嵌套类的成员没有特殊访问权限。只有嵌套类的静态数据成员对嵌套类的方法可见。10
The nested class, node, defines a cell of the linked list to consist of an integer variable and a pointer to a node object. The node class is in the private clause, which hides it from all other classes. Its members are public, however, so they are visible to the nesting class, single_linked_list. If they were private, node would need to declare the nesting class to be a friend to make them visible in the nesting class. Note that nested classes have no special access to members of the nesting class. Only static data members of the nesting class are visible to methods of the nested class.10
封闭类single_linked_list只有一个数据成员,即充当列表头的指针。它包含一个构造函数,该函数设置head为空指针值。四个成员函数允许在列表对象的任一端插入节点、从列表的一端删除节点以及测试列表是否为空。
The enclosing class, single_linked_list, has just a single data member, a pointer to act as the list’s header. It contains a constructor function, which sets head to the null pointer value. The four member functions allow nodes to be inserted at either end of a list object, nodes to be removed from one end of a list, and lists to be tested for empty.
下面的定义提供了堆栈和队列类,均基于该类single_linked_list:
The following definitions provide stack and queue classes, both based on the single_linked_list class:
class stack : public single_linked_list {
public:
stack() {}
void push(int value) {
insert_at_head(value);
}
int pop() {
return remove_at_head();
}
};
class queue : public single_linked_list {
public:
queue() {}
void enqueue(int value) {
insert_at_tail(value);
}
int dequeue() {
remove_at_head();
}
};
class stack : public single_linked_list {
public:
stack() {}
void push(int value) {
insert_at_head(value);
}
int pop() {
return remove_at_head();
}
};
class queue : public single_linked_list {
public:
queue() {}
void enqueue(int value) {
insert_at_tail(value);
}
int dequeue() {
remove_at_head();
}
};
stack请注意,和子类的对象都queue可以访问empty基类中定义的函数single_linked_list(因为它是公共派生)。两个子类都定义了不执行任何操作的构造函数。创建子类的对象时,将隐式调用子类中的适当构造函数。然后,调用基类中任何适用的构造函数。因此,在我们的示例中,stack创建类型的对象时,single_linked_list将调用中的构造函数,该构造函数执行必要的初始化。然后stack调用构造函数,该构造函数不执行任何操作。
Note that objects of both the stack and queue subclasses can access the empty function defined in the base class, single_linked_list (because it is a public derivation). Both subclasses define constructor functions that do nothing. When an object of a subclass is created, the proper constructor in the subclass is implicitly called. Then, any applicable constructor in the base class is called. So, in our example, when an object of type stack is created, the constructor in single_linked_list is called, which does the necessary initialization. Then the stack constructor is called, which does nothing.
stack和这两个类queue都存在相同的严重问题:它们的客户端都可以访问父类的所有公共成员,single_linked_list。对象的客户端stack可以调用insert_at_tail,从而破坏其堆栈的完整性。同样,对象的客户端queue可以调用insert_at_head。这些不必要的访问是允许的,因为stack和queue都是的子类型single_linked_list。当希望子类继承基类的整个接口时,使用公共派生。另一种方法是使用派生,其中子类只继承基类的实现。我们的两个示例派生类可以编写为使用派生private而不是public派生,使它们不是其父类的子类型。11然后,两者还需要重新导出empty,因为它将对它们的实例隐藏。这种情况说明了私有派生选项的动机。堆栈和队列类型的新定义,名为stack_2和queue_2,如下所示:
The classes stack and queue both suffer from the same serious problem: Clients of both can access all of the public members of the parent class, single_linked_list. A client of a stack object could call insert_at_tail, thereby destroying the integrity of its stack. Likewise, a client of a queue object could call insert_at_head. These unwanted accesses are allowed because both stack and queue are subtypes of single_linked_list. Public derivation is used when one wants the subclass to inherit the entire interface of the base class. The alternative is to use a derivation in which the subclass inherits only the implementation of the base class. Our two example derived classes can be written to make them not subtypes of their parent class by using private, rather than public, derivation.11 Then, both will also need to reexport empty, because it will become hidden to their instances. This situation illustrates the motivation for the private-derivation option. The new definitions of the stack and queue types, named stack_2 and queue_2, are shown in the following:
class stack_2 : private single_linked_list {
public:
stack_2() {}
void push(int value) {
single_linked_list :: insert_at_head(value);
}
int pop() {
return single_linked_list :: remove_at_head();
}
single_linked_list:: empty();
};
class queue_2 : private single_linked_list {
public:
queue_2() {}
void enqueue(int value) {
single_linked_list :: insert_at_tail(value);
}
int dequeue() {
single_linked_list :: remove_at_head();
}
single_linked_list:: empty();
};
class stack_2 : private single_linked_list {
public:
stack_2() {}
void push(int value) {
single_linked_list :: insert_at_head(value);
}
int pop() {
return single_linked_list :: remove_at_head();
}
single_linked_list:: empty();
};
class queue_2 : private single_linked_list {
public:
queue_2() {}
void enqueue(int value) {
single_linked_list :: insert_at_tail(value);
}
int dequeue() {
single_linked_list :: remove_at_head();
}
single_linked_list:: empty();
};
这两个类使用重新导出来允许客户端访问基类方法。当使用公共派生时,这并非必要。
These two classes use reexportation to allow access to base class methods for clients. This was not necessary when public derivation was used.
堆栈和队列的两个版本说明了子类型和非子类型的派生类型之间的区别。链表是堆栈和队列的泛化,因为两者都可以实现为链表。因此,从链表类继承来定义堆栈和队列类是很自然的。但是,它们都不是链表类的子类型,因为它们都将父类的公共成员设为私有,这使得客户端无法访问它们。
The two versions of stack and queue illustrate the difference between subtypes and derived types that are not subtypes. The linked list is a generalization of both stacks and queues, because both can be implemented as linked lists. So, it is natural to inherit from a linked-list class to define stack and queue classes. However, neither is a subtype of the linked-list class, because both make the public members of the parent class private, which makes them inaccessible to clients.
需要朋友的原因之一是,有时必须编写一个可以访问两个不同类的成员的子程序。例如例如,假设一个程序使用一个向量类和一个矩阵类,并且需要一个子程序将矩阵对象与向量对象相乘。在 C++ 中,可以将乘法函数设为这两个类的友元。
One of the reasons friends are necessary is that sometimes a subprogram must be written that can access the members of two different classes. For example, suppose a program uses a class for vectors and one for matrices, and a subprogram is needed to multiply a matrix object by a vector object. In C++, the multiply function can be made a friend of both classes.
如前所述,C++ 提供多重继承。例如,假设我们想要一个绘图类,它需要一个为绘制图形而编写的类的行为,并且新类的方法需要在单独的线程中运行。我们可以定义以下内容:
As previously stated, C++ provides multiple inheritance. As an example, suppose we wanted a class for drawing that needed the behavior of a class written for drawing figures and the methods of the new class needed to run in a separate thread. We might define the following:
class Thread { . . . };
class Drawing { . . . };
class DrawThread : public Thread, public Drawing { . . . };
class Thread { . . . };
class Drawing { . . . };
class DrawThread : public Thread, public Drawing { . . . };
类继承了和DrawThread的所有成员。如果和恰好包含同名成员,则可以使用范围解析运算符 ( ) 在 类 的对象中明确引用它们。此多重继承示例如图12.5所示。ThreadDrawingThreadDrawingDrawThread::
Class DrawThread inherits all of the members of both Thread and Drawing. If both Thread and Drawing happen to include members with the same name, they can be unambiguously referenced in objects of class DrawThread by using the scope resolution operator (::). This example of multiple inheritance is shown in Figure 12.5.
Some issues with the C++ implementation of multiple inheritance are discussed in Section 12.5.
C++ 中的重写方法12必须具有与被重写方法完全相同的参数配置文件。如果参数配置文件存在任何差异,则子类中的方法将被视为与祖先类中同名方法无关的新方法。重写方法的返回类型必须与被重写方法的返回类型相同,或者必须是被重写方法返回类型的公共派生类型。
Overriding methods12 in C++ must have exactly the same parameter profile as the overridden method. If there is any difference in the parameter profiles, the method in the subclass is considered a new method that is unrelated to the method with the same name in the ancestor class. The return type of the overriding method either must be the same as that of the overridden method or must be a publicly derived type of the return type of the overridden method.
到目前为止,我们定义的所有成员函数都是静态绑定的;也就是说,对其中一个函数的调用静态地绑定到函数定义。C++ 对象可以通过值变量来操作,而不是通过指针或引用。(这样的对象将是静态的或堆栈动态的。)但是,在这种情况下,对象的类型是已知的和静态的,因此不需要动态绑定。另一方面,具有基类类型的指针变量可用于指向从该基类公开派生的任何类的任何堆动态对象,使其成为多态变量。如果基类的任何成员都不是私有的,则公共派生的子类是子类型。私有派生的子类永远不会是子类型。指向基类的指针不能用于引用非子类型的子类中的方法。
All of the member functions we have defined thus far are statically bound; that is, a call to one of them is statically bound to a function definition. A C++ object could be manipulated through a value variable, rather than a pointer or a reference. (Such an object would be static or stack dynamic.) However, in that case, the object’s type is known and static, so dynamic binding is not needed. On the other hand, a pointer variable that has the type of a base class can be used to point to any heap-dynamic object of any class publicly derived from that base class, making it a polymorphic variable. Publicly derived subclasses are subtypes if none of the members of the base class are private. Privately derived subclasses are never subtypes. A pointer to a base class cannot be used to reference a method in a subclass that is not a subtype.
C++ 不允许值变量(与指针或引用相反)具有多态性。当使用多态变量调用派生类中被重写的成员函数时,该调用必须动态绑定到正确的成员函数定义。
C++ does not allow value variables (as opposed to pointers or references) to be polymorphic. When a polymorphic variable is used to call a member function overridden in one of the derived classes, the call must be dynamically bound to the correct member function definition.
考虑这样一种情况:有一个名为 的基类Shape,以及一组用于各种形状(如圆形、矩形等)的派生类。如果需要显示这些形状,则显示成员函数draw必须对每个后代或形状类型都是唯一的。这些版本的draw必须定义为虚拟的。当使用指向draw派生类基类的指针调用 时,该调用必须动态绑定到正确的派生类的成员函数。以下示例具有刚刚描述的示例情况的骨架定义:
Consider the situation of having a base class named Shape, along with a collection of derived classes for different kinds of shapes, such as circles, rectangles, and so forth. If these shapes need to be displayed, then the displaying member function, draw, must be unique for each descendant, or kind of shape. These versions of draw must be defined to be virtual. When a call to draw is made with a pointer to the base class of the derived classes, that call must be dynamically bound to the member function of the correct derived class. The following example has the skeletal definitions for the example situation just described:
class Shape {
public:
virtual void draw() = 0;
. . .
};
class Circle : public Shape {
public:
void draw() { . . . }
. . .
};
class Rectangle : public Shape {
public:
void draw() { . . . }
. . .
};
class Shape {
public:
virtual void draw() = 0;
. . .
};
class Circle : public Shape {
public:
void draw() { . . . }
. . .
};
class Rectangle : public Shape {
public:
void draw() { . . . }
. . .
};
鉴于这些定义,以下代码包含静态和动态绑定调用的示例:
Given these definitions, the following code has examples of both statically and dynamically bound calls:
Circle* circ = new Circle;
Rectangle* rect = new Rectangle;
Shape* ptr_shape;
ptr_shape = circ; // Now ptr_shape points to a
// Circle object
ptr_shape->draw(); // Dynamically bound to the draw
// in the Circle class
rect->draw(); // Statically bound to the draw
// in the Rectangle class
Circle* circ = new Circle;
Rectangle* rect = new Rectangle;
Shape* ptr_shape;
ptr_shape = circ; // Now ptr_shape points to a
// Circle object
ptr_shape->draw(); // Dynamically bound to the draw
// in the Circle class
rect->draw(); // Statically bound to the draw
// in the Rectangle class
此情况如图 12.6所示。
This situation is shown in Figure 12.6.
请注意,draw基类定义中的函数shape被设置为 0。这种特殊的语法用于表明此成员函数是纯虚函数,这意味着它没有函数体并且不能被调用。如果派生类调用该函数,则必须重新定义它。纯虚函数的目的是提供函数的接口而不提供任何实现。当基类中的实际成员函数没有用时,通常会定义纯虚函数。回想一下,在第12.2.3节 中,我们讨论了基类Building,每个子类都描述了某种特定的构建。每个子类都有一个draw方法,但这些方法在基类中都没有用。所以,draw在类中将是一个纯虚函数Building。
Notice that the draw function in the definition of the base class shape is set to 0. This peculiar syntax is used to indicate that this member function is a pure virtual function, meaning that it has no body and it cannot be called. It must be redefined in derived classes if they call the function. The purpose of a pure virtual function is to provide the interface of a function without giving any of its implementation. Pure virtual functions are usually defined when an actual member function in the base class would not be useful. Recall that in Section 12.2.3, a base class Building was discussed, and each subclass described some particular kind of building. Each subclass had a draw method but none of these would be useful in the base class. So, draw would be a pure virtual function in the Building class.
任何包含纯虚函数的类都是抽象类。在 C++ 中,抽象类不以保留字标记。抽象类可以包含完全定义的方法。由于存在一个或多个虚函数,实例化抽象类是非法的。严格意义上来说,抽象类是仅用于表示类型特征的类。C++ 提供抽象类来模拟这些真正抽象的类。如果抽象类的子类没有重新定义其父类的纯虚函数,则该函数在子类中仍为纯虚函数,并且子类也是抽象类。
Any class that includes a pure virtual function is an abstract class. In C++, an abstract class is not marked with a reserved word. An abstract class can include completely defined methods. Because of the presence of one or more virtual functions, it is illegal to instantiate an abstract class. In a strict sense, an abstract class is one that is used only to represent the characteristics of a type. C++ provides abstract classes to model these truly abstract classes. If a subclass of an abstract class does not redefine a pure virtual function of its parent class, that function remains as a pure virtual function in the subclass and the subclass is also an abstract class.
抽象类和继承共同支持一种强大的软件开发技术。它们允许以层次结构定义类型,以便相关类型可以成为真正抽象类型的子类,这些子类定义了它们共同的抽象特征。
Abstract classes and inheritance together support a powerful technique for software development. They allow types to be hierarchically defined so that related types can be subclasses of truly abstract types that define their common abstract characteristics.
draw动态绑定允许在编写所有或任何版本之前编写使用成员的代码draw。几年后可以添加新的派生类,而无需对使用此类动态绑定成员的代码进行任何更改。这是面向对象语言的一个非常有用的功能。
Dynamic binding allows the code that uses members like draw to be written before all or even any of the versions of draw are written. New derived classes could be added years later, without requiring any change to the code that uses such dynamically bound members. This is a highly useful feature of object-oriented languages.
堆栈动态对象的引用赋值与堆动态对象的指针赋值不同。例如,考虑以下代码,它使用与上一个示例相同的类层次结构:
Reference assignments for stack-dynamic objects are different from pointer assignments for heap-dynamic objects. For example, consider the following code, which uses the same class hierarchy as the last example:
Circle circ; // Allocate a Circle object on the stack
Rectangle rect; // Allocate a Rectangle object on
// the stack
rect = circ; // Copies the data member values from
// the Circle object
rect.draw(); // Calls the draw from the Rectangle
// object
Circle circ; // Allocate a Circle object on the stack
Rectangle rect; // Allocate a Rectangle object on
// the stack
rect = circ; // Copies the data member values from
// the Circle object
rect.draw(); // Calls the draw from the Rectangle
// object
在赋值中rect = circ,引用的对象的成员数据circ将被赋值给引用的对象的数据成员rect,但rect仍将引用该对象。因此,通过引用的对象对的Rectangle调用将是类的调用。如果和是指向堆动态对象的指针,则相同的赋值将是指针赋值,这将使指向对象,并且对通过的调用将动态绑定到对象中的。drawrectRectanglerectcircrectCircledrawrectdrawCircle
In the assignment rect = circ, the member data from the object referenced by circ would be assigned to the data members of the object referenced by rect, but rect would still reference the Rectangle object. Therefore, the call to draw through the object referenced by rect would be that of the Rectangle class. If rect and circ were pointers to heap-dynamic objects, the same assignment would be a pointer assignment, which would make rect point to the Circle object, and a call to draw through rect would be bound dynamically to the draw in the Circle object.
很自然地,人们会将 C++ 的面向对象特性与 Smalltalk 的特性进行比较。在访问控制方面,C++ 的继承比 Smalltalk 的继承更复杂。通过使用类定义中的访问控制和派生访问控制,以及朋友的可能性函数和友元类,C++ 程序员可以对类成员的访问进行高度细致的控制。尽管 C++ 提供多重继承而 Smalltalk 没有,但许多人认为这对 C++ 来说不是一个优势。多重继承的缺点严重抵消了它的价值。事实上,C++ 是本章讨论的唯一支持多重继承的语言。另一方面,提供多重继承替代方案的语言(如 Java 和 C#)在这方面明显比 Smalltalk 更有优势。
It is natural to compare the object-oriented features of C++ with those of Smalltalk. The inheritance of C++ is more intricate than that of Smalltalk in terms of access control. By using both the access controls within the class definition and the derivation access controls, and also the possibility of friend functions and friend classes, the C++ programmer has highly detailed control over access to class members. Although C++ provides multiple inheritance and Smalltalk does not, there are many who feel that is not an advantage for C++. The downsides of multiple inheritance weigh heavily against its value. In fact, C++ is the only language discussed in this chapter that supports multiple inheritance. On the other hand, languages that provide alternatives to multiple inheritance, such as Java and C#, clearly have an advantage over Smalltalk in that area.
在 C++ 中,程序员可以指定是使用静态绑定还是动态绑定。由于静态绑定速度更快,因此这对于不需要动态绑定的情况来说是一个优势。此外,与 Smalltalk 相比,C++ 中的动态绑定速度也更快。将 C++ 中的虚拟成员函数调用绑定到函数定义具有固定成本,无论该定义在继承层次结构中出现多远。对虚拟函数的调用仅比静态绑定调用多需要五个内存引用(Stroustrup,1988)。然而,在 Smalltalk 中,消息始终动态绑定到方法,并且正确的方法在继承层次结构中的位置越远,所需的时间就越长。允许用户决定哪些绑定是静态的、哪些是动态的缺点是,原始设计必须包括这些决定,而这些决定可能必须在以后进行更改。
In C++, the programmer can specify whether static binding or dynamic binding is to be used. Because static binding is faster, this is an advantage for those situations where dynamic binding is not necessary. Furthermore, even the dynamic binding in C++ is fast when compared with that of Smalltalk. Binding a virtual member function call in C++ to a function definition has a fixed cost, regardless of how distant in the inheritance hierarchy the definition appears. Calls to virtual functions require only five more memory references than statically bound calls (Stroustrup, 1988). In Smalltalk, however, messages are always dynamically bound to methods, and the farther away in the inheritance hierarchy the correct method is, the longer it takes. The disadvantage of allowing the user to decide which bindings are static and which are dynamic is that the original design must include these decisions, which may have to be changed later.
C++ 的静态类型检查优于 Smalltalk,因为 Smalltalk 中的所有类型检查都是动态的。Smalltalk 程序可以向不存在的方法发送消息,这些消息只有在程序执行时才会被发现。C++ 编译器可以发现此类错误。与测试中发现的错误相比,编译器检测到的错误修复成本更低。
The static type checking of C++ is an advantage over Smalltalk, in which all type checking is dynamic. A Smalltalk program can be written with messages to nonexistent methods, which are not discovered until the program is executed. A C++ compiler finds such errors. Compiler-detected errors are less expensive to repair than those found in testing.
Smalltalk 本质上是无类型的,这意味着所有代码实际上都是通用的。这提供了很大的灵活性,但牺牲了静态类型检查。C++ 通过其模板功能提供泛型类(如第11章 所述),这保留了静态类型检查的好处。
Smalltalk is essentially typeless, meaning that all code is effectively generic. This provides a great deal of flexibility, but static type checking is sacrificed. C++ provides generic classes through its template facility (as described in Chapter 11), which retains the benefits of static type checking.
Smalltalk 的主要优势在于语言的优雅和简单,这源于其设计的单一哲学。它完全专注于面向对象范式,没有因根深蒂固的用户群的突发奇想而做出的妥协。另一方面,C++ 是一种庞大而复杂的语言,没有单一的哲学作为其基础,除了支持面向对象编程和包括 C 用户群。其最重要的目标之一是保留 C 的效率和风格,同时提供面向对象编程的优势。有些人认为这种语言的功能并不总是能很好地结合在一起,至少有些复杂性是不必要的。
The primary advantage of Smalltalk lies in the elegance and simplicity of the language, which results from the single philosophy of its design. It is purely and completely devoted to the object-oriented paradigm, devoid of compromises necessitated by the whims of an entrenched user base. C++, on the other hand, is a large and complex language with no single philosophy as its foundation, except to support object-oriented programming and include the C user base. One of its most significant goals was to preserve the efficiency and flavor of C while providing the advantages of object-oriented programming. Some people feel that the features of this language do not always fit well together and that at least some of the complexity is unnecessary.
根据 Chambers 和 Ungar (1991) 的说法,Smalltalk 运行一组特定的小型 C 风格基准测试的速度仅为优化 C 的 10%。C++ 程序所需的时间仅比同等的 C 程序稍多一点 (Stroustrup, 1988)。鉴于 Smalltalk 和 C++ 之间的巨大效率差距,难怪 C++ 的商业用途远比 Smalltalk 广泛。这种差异还有其他因素,但效率显然是支持 C++ 的有力论据。当然,所有支持面向对象编程的编译语言的运行速度都比 Smalltalk 快大约 10 倍。
According to Chambers and Ungar (1991), Smalltalk ran a particular set of small C-style benchmarks at only 10 percent of the speed of optimized C. C++ programs require only slightly more time than equivalent C programs (Stroustrup, 1988). Given the great efficiency gap between Smalltalk and C++, it is little wonder that the commercial use of C++ is far more widespread than that of Smalltalk. There are other factors in this difference, but efficiency is clearly a strong argument in favor of C++. Of course, all of the compiled languages that support object-oriented programming run approximately 10 times faster than Smalltalk.
由于 Java 的类、继承和方法的设计与 C++ 相似,因此在本节中我们仅关注 Java 与 C++ 的不同之处。
Because Java’s design of classes, inheritance, and methods is similar to that of C++, in this section we focus only on those areas in which Java differs from C++.
与 C++ 一样,Java 支持对象和非对象数据。然而,在 Java 中,只有原始标量类型(布尔、字符和数字类型)的值不是对象。Java 的枚举和数组是对象。Java 拥有非对象的原因是为了提高效率。
As with C++, Java supports both objects and nonobject data. However, in Java, only values of the primitive scalar types (Boolean, character, and the numeric types) are not objects. Java’s enumerations and arrays are objects. The reason Java has nonobjects is efficiency.
在 Java 中
当将原始值放入对象上下文中时,它们会被隐式强制转换。此强制转换将原始值转换为原始值类型的包装器类的对象。例如,将值int或变量放入对象上下文会导致创建Integer具有原始值的对象int。此强制转换称为装箱。
In Java
primitive values are implicitly coerced when they are put in object context. This coercion converts the primitive value to an object of the wrapper class of the primitive value’s type. For example, putting an int value or variable into object context causes the creation of an Integer object with the value of the int primitive. This coercion is called boxing.
尽管 C++ 类可以定义为没有父类,但在 Java 中这是不可能的。所有 Java 类都必须是根类的子类,Object或者是某个后代类Object。这样做的一个优点是,一些常用的方法(例如toString和)可以在所有其他类equals中定义并继承和使用。Object
Whereas C++ classes can be defined to have no parent, that is not possible in Java. All Java classes must be subclasses of the root class, Object, or some class that is a descendant of Object. One advantage of this is that some commonly needed methods, such as toString and equals, can be defined in Object and inherited and used by all other classes.
所有 Java 对象都是显式堆动态的。大多数使用new运算符进行分配,但没有显式的释放运算符。垃圾收集用于存储回收。与许多其他语言特性一样,虽然垃圾收集避免了一些严重的问题,如悬垂指针,但它可能会导致其他问题。其中一个困难是因为垃圾收集器释放或回收对象占用的存储,但它仅此而已。例如,如果对象可以访问堆内存以外的某些资源,如文件或共享资源上的锁,则垃圾收集器不会回收这些资源。对于这些情况,Java 允许包含一个特殊方法,finalize它与 C++ 析构函数相关。
All Java objects are explicit heap dynamic. Most are allocated with the new operator, but there is no explicit deallocation operator. Garbage collection is used for storage reclamation. Like many other language features, although garbage collection avoids some serious problems, such as dangling pointers, it can cause other problems. One such difficulty arises because the garbage collector deallocates, or reclaims the storage occupied by an object, but it does no more. For example, if an object has access to some resource other than heap memory, such as a file or a lock on a shared resource, the garbage collector does not reclaim these. For these situations, Java allows the inclusion of a special method, finalize, which is related to a C++ destructor function.
finalize当垃圾收集器即将回收对象占用的存储空间时,会隐式调用一个方法。 问题finalize是它(和垃圾收集器)的运行时间无法强制甚至无法预测。 使用该方法回收即将被垃圾收集的对象所持有的资源的替代方法finalize是包含一个执行回收的方法。 唯一的问题是对象的所有客户端都必须知道这个方法并记得调用它。
A finalize method is implicitly called when the garbage collector is about to reclaim the storage occupied by the object. The problem with finalize is that the time it (and the garbage collector) will run cannot be forced or even predicted. The alternative to using finalize to reclaim resources held by an object about to be garbage collected is to include a method that does the reclamation. The only problem with this is that all clients of the objects must be aware of this method and remember to call it.
在 Java 中,方法可以定义为final,这意味着它不能在任何后代类中被覆盖。当final在类定义上指定保留字时,这意味着该类不能被子类化。 final 类中的所有方法都是隐式 final 的,这意味着方法调用与类的方法的绑定是静态绑定的。
In Java, a method can be defined to be final, which means that it cannot be overridden in any descendant class. When the final reserved word is specified on a class definition, it means the class cannot be subclassed. All of the methods in a final class are implicitly final, which means that the bindings of method calls to the methods of the class are statically bound.
将类定义为 final 的优点是不允许对类进行任何更改。例如,String是一个 final 类,因此任何接收String参数引用的方法都可以依赖于方法含义的稳定性String。缺点是将类定义为 final 不允许重用,即使需要进行微小的修改。
The advantage of defining a class to be final is that no changes to the class are allowed. For example, String is a final class and because of that any method that receives a String reference in a parameter can depend on the stability of the meaning of String’s methods. The disadvantage is that defining a class to be final disallows reuses that require even minor modifications.
Java 包含注释@Override,它通知编译器检查以下方法是否覆盖祖先类中的方法。如果没有,编译器会发出错误消息。
Java includes the annotation @Override, which informs the compiler to check to determine whether the following method overrides a method in an ancestor class. If it does not, the compiler issues an error message.
与 C++ 类似,Java 要求在调用子类构造函数之前先调用父类构造函数。如果要将参数传递给父类构造函数,则必须显式调用该构造函数,如下例所示:
Like C++, Java requires that parent class constructor be called before the subclass constructor is called. If parameters are to be passed to the parent class constructor, that constructor must be explicitly called, as in the following example:
super(100, true);
super(100, true);
如果没有显式调用父类构造函数,则编译器会插入对父类中零参数构造函数的调用。
If there is no explicit call to the parent class constructor, the compiler inserts a call to the zero-parameter constructor in the parent class.
Java 不支持 C++ 的私有派生。可以推测 Java 设计者认为子类应该是子类型,而当支持私有派生时,子类就不是子类型了。因此,他们没有包括它们。因此,Java 的子类可以是子类型。
Java does not support the private derivations of C++. One can surmise that the Java designers believed that subclasses should be subtypes, which they are not when private derivations are supported. Thus, they did not include them. So, Java’s subclasses can be subtypes.
Java 的早期版本包含一个集合,Vector,其中包含一长串用于操作集合构造中的数据的方法。这些版本的 Java 还包括 的子类Vector,Stack,其中添加了用于推送和弹出操作的方法。不幸的是,由于 Java 没有私有派生, 的所有方法Vector在类中也是可见的Stack,这使得Stack对象容易受到各种可能使这些对象无效的操作的影响。
Early versions of Java included a collection, Vector, which included a long list of methods for manipulating data in a collection construct. These versions of Java also included a subclass of Vector, Stack, which added methods for push and pop operations. Unfortunately, because Java does not have private derivation, all of the methods of Vector were also visible in the Stack class, which made Stack objects liable to a variety of operations that could invalidate those objects.
Java 直接支持仅单继承。但是,它包含一种称为接口的抽象类,该类提供对多继承的部分支持。接口定义类似于类定义,不同之处在于它只能包含命名常量和方法声明(而不是定义)。它不能包含构造函数、非抽象方法或变量声明。因此,接口只不过是其名称所表示的内容 — — 它仅定义类的规范。(回想一下,C++ 抽象类可以具有实例变量,并且除一种方法之外的所有方法都可以完全定义。)类不会继承接口;而是实现它。事实上,类可以实现任意数量的接口。要实现接口,类必须实现其规范(但不是主体)出现在接口定义中的所有方法。
Java directly supports only single inheritance. However, it includes a kind of abstract class, called an interface, which provides partial support for multiple inheritance. An interface definition is similar to a class definition, except that it can contain only named constants and method declarations (not definitions). It cannot contain constructors, nonabstract methods, or variable declarations. So, an interface is no more than what its name indicates—it defines only the specification of a class. (Recall that a C++ abstract class can have instance variables and all but one of the methods can be completely defined.) A class does not inherit an interface; it implements it. In fact, a class can implement any number of interfaces. To implement an interface, the class must implement all of the methods whose specifications (but not bodies) appear in the interface definition.
接口可用于模拟多重继承。类可以从类派生并实现接口,接口将取代第二个父类。这有时称为混合继承,因为接口的常量和方法与从超类继承的方法和数据以及子类中定义的任何新数据和/或方法混合在一起。
An interface can be used to simulate multiple inheritance. A class can be derived from a class and implement an interface, with the interface taking the place of a second parent class. This is sometimes called mixin inheritance, because the constants and methods of the interface are mixed in with the methods and data inherited from the superclass, as well as any new data and/or methods defined in the subclass.
接口的另一个有趣的功能是它们提供了另一种多态性。这是因为接口可以被视为类型。例如,方法可以指定一个作为接口的形式参数。这种形式参数可以接受实现该接口的任何类的实际参数,从而使该方法具有多态性。
One more interesting capability of interfaces is that they provide another kind of polymorphism. This is because interfaces can be treated as types. For example, a method can specify a formal parameter that is an interface. Such a formal parameter can accept an actual parameter of any class that implements the interface, making the method polymorphic.
非参数变量也可以声明为接口类型。此类变量可以引用实现该接口的任何类的任何对象。
A nonparameter variable also can be declared to be of the type of an interface. Such a variable can reference any object of any class that implements the interface.
多重继承的问题之一是当一个类派生自两个父类,并且两个父类都定义了具有相同名称和协议的公共方法时。接口可以避免这个问题。虽然实现接口的类必须为接口中指定的所有方法提供定义,但如果类和接口都包含具有相同名称和协议的方法,则该类无需重新实现该方法。因此,多重继承中可能出现的方法名称冲突不会在单继承和接口中发生。此外,由于接口无法定义变量,因此完全避免了变量名称冲突。
One of the problems with multiple inheritance occurs when a class is derived from two parent classes and both define a public method with the same name and protocol. This problem is avoided with interfaces. Although a class that implements an interface must provide definitions for all of the methods specified in the interface, if the class and the interface both include methods with the same name and protocol, the class need not reimplement that method. So, the method name conflicts that can occur with multiple inheritance cannot occur with single inheritance and interfaces. Furthermore, variable name conflicts are completely avoided because interfaces cannot define variables.
接口不能替代多重继承,因为多重继承可以实现代码重用,而接口不提供代码重用。这是一个重要的区别,因为代码重用是继承的主要优点之一。Java 提供了一种部分避免这一缺陷的方法。实现的接口之一可以用抽象类替换,抽象类可以包含可以继承的代码,从而提供一些代码重用。
An interface is not a replacement for multiple inheritance, because in multiple inheritance there is code reuse, while interfaces provide no code reuse. This is an important difference, because code reuse is one of the primary benefits of inheritance. Java provides one way to partially avoid this deficiency. One of the implemented interfaces could be replaced by an abstract class, which could include code that could be inherited, thereby providing some code reuse.
使用接口替代多重继承的一个问题是:如果一个类试图实现两个接口,并且都定义具有相同名称和协议的方法,则无法在该类中实现这两个接口。
One problem with interfaces being a replacement for multiple inheritance is the following: If a class attempts to implement two interfaces and both define methods that have the same name and protocol, there is no way to implement both in the class.
作为接口的示例,请考虑sort标准 Java 类的方法Array。使用此方法的任何类都必须提供方法的实现,以比较要排序的元素。通用Comparable接口为该比较方法提供了协议,名为compareTo。接口的代码Comparable如下:
As an example of an interface, consider the sort method of the standard Java class, Array. Any class that uses this method must provide an implementation of a method to compare the elements to be sorted. The generic Comparable interface provides the protocol for this comparing method, which is named compareTo. The code for the Comparable interface is as follows:
public interface Comparable <T> {
public int compareTo(T b);
}
public interface Comparable <T> {
public int compareTo(T b);
}
compareTo如果调用该方法的对象在参数对象之前,则该方法必须返回负整数;如果它们相等,则返回零;如果参数在compareTo调用该方法的对象之前,则返回正整数。实现Comparable 接口可以对任何泛型类型的对象数组的内容进行排序,只要compareTo实现了泛型类型的实现方法并提供适当的值即可。接口已成为多重继承的常见替代品。某种形式的接口现在是 C#、Swift、Ruby 和 Ada 的一部分。
The compareTo method must return a negative integer if the object through which it is called belongs before the parameter object, zero if they are equal, and a positive integer if the parameter belongs before the object through which compareTo was called. A class that implements the Comparable interface can sort the contents of any array of objects of the generic type, as long as the implemented compareTo method for the generic type is implemented and provides the appropriate value. Interfaces have become a common substitute for multiple inheritance. Some form of interfaces are now part of C#, Swift, Ruby, and Ada.
除了接口之外,Java 还支持抽象类,类似于 C++ 的抽象类。Java 抽象类的抽象方法仅表示为方法的标头,其中包含abstract保留字。抽象类也标记为abstract。当然,抽象类不能实例化。
In addition to interfaces, Java also supports abstract classes, similar to those of C++. The abstract methods of a Java abstract class are represented as just the method’s header, which includes the abstract reserved word. The abstract class is also marked abstract. Of course, abstract classes cannot be instantiated.
Chapter 14 illustrates the use of interfaces in Java event handling.
在 C++ 中,必须将方法定义为虚拟方法才能允许动态绑定。在 Java 中,所有方法调用都是动态绑定的,除非被调用的方法已定义为final,在这种情况下它不能被覆盖,并且所有绑定都是静态的。如果方法是static或private,也会使用静态绑定,这两者都不允许覆盖。
In C++, a method must be defined as virtual to allow dynamic binding. In Java, all method calls are dynamically bound unless the called method has been defined as final, in which case it cannot be overridden and all bindings are static. Static binding is also used if the method is static or private, both of which disallow overriding.
Java 有几种嵌套类,它们的优点在于,除了嵌套类之外,它们对包中的所有类都隐藏。直接嵌套在另一个类中的非静态类称为内部类。内部类的每个实例都必须有一个指向其所属嵌套类实例的隐式指针。这使嵌套类的方法可以访问嵌套类的所有成员,包括私有成员。静态嵌套类没有此指针,因此它们无法访问嵌套类的成员。因此,Java 中的静态嵌套类类似于 C++ 的嵌套类。
Java has several varieties of nested classes, all of which have the advantage of being hidden from all classes in their package, except for the nesting class. Nonstatic classes that are nested directly in another class are called inner classes. Each instance of an inner class must have an implicit pointer to the instance of its nesting class to which it belongs. This gives the methods of the nested class access to all of the members of the nesting class, including the private members. Static nested classes do not have this pointer, so they cannot access members of the nesting class. Therefore, static nested classes in Java are like the nested classes of C++.
虽然在静态作用域语言中这看起来很奇怪,但内部类的成员,甚至是私有成员,都可以在外部类中访问。此类引用必须包括引用内部类对象的变量。例如,假设外部类使用以下语句创建内部类的实例:
Though it seems odd in a static-scoped language, the members of the inner class, even the private members, are accessible in the outer class. Such references must include the variable that references the inner class object. For example, suppose the outer class creates an instance of the inner class with the following statement:
myInner = this.new Inner();
myInner = this.new Inner();
然后,如果内部类定义了一个名为的变量sum,则可以在外部类中将其引用为myInner.sum。
Then, if the inner class defines a variable named sum, it can be referenced in the outer class as myInner.sum.
嵌套类的实例只能存在于其嵌套类的实例中。嵌套类也可以是匿名的。匿名嵌套类的语法很复杂,但实际上只是定义仅在一个位置使用的类的简化方式。第14章 中给出了一个匿名嵌套类的示例。
An instance of a nested class can only exist within an instance of its nesting class. Nested classes can also be anonymous. Anonymous nested classes have complex syntax but are really only an abbreviated way to define a class that is used from just one location. An example of an anonymous nested class appears in Chapter 14.
局部嵌套类在其嵌套类的方法中定义。局部嵌套类从不使用访问说明符(private或public)来定义。它们的作用域始终限于其嵌套类。局部嵌套类中的方法可以访问其嵌套类中定义的变量以及final定义局部嵌套类的方法中定义的变量。局部嵌套类的成员仅在定义局部嵌套类的方法中可见。
A local nested class is defined in a method of its nesting class. Local nested classes are never defined with an access specifier (private or public). Their scope is always limited to their nesting class. A method in a local nested class can access the variables defined in its nesting class and the final variables defined in the method in which the local nested class is defined. The members of a local nested class are visible only in the method in which the local nested class is defined.
Java 支持面向对象编程的设计与 C++ 类似,但它更一致地遵循面向对象原则。Java 不允许无父类,并使用动态绑定作为将方法调用绑定到方法定义的“正常”方式。当然,与许多方法绑定都是静态的语言相比,这会稍微增加执行时间。然而,在做出这一设计决定时,大多数 Java 程序都是解释型的,因此解释时间使得额外的绑定时间变得微不足道。与 C++ 的复杂访问控制(从派生控制到友元函数)相比,类定义内容的访问控制相当简单。最后,Java 使用接口来提供一种对多重继承的支持形式,它没有实际多重继承的所有缺点。
Java’s design for supporting object-oriented programming is similar to that of C++, but it employs more consistent adherence to object-oriented principles. Java does not allow parentless classes and uses dynamic binding as the “normal” way to bind method calls to method definitions. This, of course, increases execution time slightly over languages in which many method bindings are static. At the time this design decision was made, however, most Java programs were interpreted, so interpretation time made the extra binding time insignificant. Access controls for the contents of a class definition are rather simple when compared with the jungle of access controls of C++, ranging from derivation controls to friend functions. Finally, Java uses interfaces to provide a form of support for multiple inheritance, which does not have all of the drawbacks of actual multiple inheritance.
C#对面向对象编程的支持与Java类似。
C#’s support for object-oriented programming is similar to that of Java.
C# 包含类和结构,类与 Java 的类非常相似,而结构的功能稍弱一些。一个重要的区别是结构是值类型;也就是说,它们是堆栈动态的。这可能会导致对象切片问题,但结构不能被子类化这一限制可以防止这种情况发生。有关 C# 结构与其类的不同之处的更多详细信息请参见第11章 。
C# includes both classes and structs, with the classes being very similar to Java’s classes and the structs being somewhat less powerful constructs. One important difference is that structs are value types; that is, they are stack dynamic. This could cause the problem of object slicing, but this is prevented by the restriction that structs cannot be subclassed. More details of how C# structs differ from its classes appeared in Chapter 11.
C# 使用 C++ 的语法来定义类。例如,
C# uses the syntax of C++ for defining classes. For example,
public class NewClass : ParentClass { . . . }
public class NewClass : ParentClass { . . . }
通过在子类中用new标记其定义,可以在派生类中替换从父类继承的方法。该new方法隐藏了父类中同名方法的正常访问。但是,父类版本仍可通过在调用前添加 来调用base。例如,
A method inherited from the parent class can be replaced in the derived class by marking its definition in the subclass with new. The new method hides the method of the same name in the parent class to normal access. However, the parent class version can still be called by prefixing the call with base. For example,
base.Draw();
base.Draw();
与 Java 一样,子类可以是子类型。C# 对接口的支持与 Java 相同。它不支持多重继承。
As with Java, subclasses can be subtypes. C#’s support for interfaces is the same as that of Java. It does not support multiple inheritance.
为了允许在 C# 中将方法调用动态绑定到方法,基类方法及其在派生类中的对应方法都必须进行特殊标记。基类方法必须标记为virtual,就像在 C++ 中一样。为了明确子类中与祖先类中的虚拟方法具有相同名称和协议的方法的意图,C# 要求在要覆盖override父类虚拟方法时对此类方法进行标记。13例如,第 12.4.2.3节Shape中出现的C++ 类的 C# 版本如下:
To allow dynamic binding of method calls to methods in C#, both the base method and its corresponding methods in derived classes must be specially marked. The base class method must be marked with virtual, as in C++. To make clear the intent of a method in a subclass that has the same name and protocol as a virtual method in an ancestor class, C# requires that such methods be marked override if they are to override the parent class virtual method.13 For example, the C# version of the C++ Shape class that appears in Section 12.4.2.3 is as follows:
public class Shape {
public virtual void Draw() { . . . }
. . .
}
public class Circle : Shape {
public override void Draw() { . . . }
. . .
}
public class Rectangle : Shape {
public override void Draw() { . . . }
. . .
}
public class Square : Rectangle {
public override void Draw() { . . . }
. . .
}
public class Shape {
public virtual void Draw() { . . . }
. . .
}
public class Circle : Shape {
public override void Draw() { . . . }
. . .
}
public class Rectangle : Shape {
public override void Draw() { . . . }
. . .
}
public class Square : Rectangle {
public override void Draw() { . . . }
. . .
}
C# 包含与 C++ 类似的抽象方法,只是它们的指定语法不同。例如,以下是 C# 抽象方法:
C# includes abstract methods similar to those of C++, except that they are specified with different syntax. For example, the following is a C# abstract method:
abstract public void Draw();
abstract public void Draw();
至少包含一个抽象方法的类是抽象类,每个抽象类都必须标记abstract。抽象类不能实例化。因此,任何将被实例化的抽象类的子类都必须实现它继承的所有抽象方法。
A class that includes at least one abstract method is an abstract class, and every abstract class must be marked abstract. Abstract classes cannot be instantiated. It follows that any subclass of an abstract class that will be instantiated must implement all abstract methods that it inherits.
与 Java 一样,所有 C# 类最终都派生自单个根类。Object该类Object定义方法集合,包括ToString、Finalize和Equals,所有 C# 类型都继承了这些方法。
As with Java, all C# classes are ultimately derived from a single root class, Object. The Object class defines a collection of methods, including ToString, Finalize, and Equals, which are inherited by all C# types.
直接嵌套在类中的 C# 类的行为类似于 Java 静态嵌套类(类似于 C++ 中的嵌套类)。与 C++ 一样,C# 不支持行为类似于 Java 非静态嵌套类的嵌套类。
A C# class that is directly nested in a class behaves like a Java static nested class (which is like a nested class in C++). Like C++, C# does not support nested classes that behave like the nonstatic nested classes of Java.
由于 C# 是一种最近才设计出来的基于 C 的面向对象语言,因此人们应该期望它的设计者从他们的前辈那里吸取教训,复制过去的成功并解决一些问题。这种做法的一个结果是,再加上 Java 的几个问题,C# 对面向对象编程的支持与 Java 之间的差异相对较小。C# 中提供结构体,而 Java 没有,这可以看作是一种改进。与 Java 一样,C# 对面向对象编程的支持比 C++ 更简单,许多人认为这是一种改进。
Because C# is a recently designed C-based object-oriented language, one should expect that its designers learned from their predecessors and duplicated the successes of the past and remedied some of the problems. One result of this, coupled with the few problems with Java, is that the differences between C#’s support for object-oriented programming and that of Java are relatively minor. The availability of structs in C#, which Java does not have, can be considered an improvement. Like that of Java, C#’s support for object-oriented programming is simpler than that of C++, which many consider an improvement.
如前所述,Ruby 是一种纯粹的面向对象编程语言,就像 Smalltalk 一样。该语言中的几乎所有东西都是对象,所有计算都是通过消息传递完成的。尽管程序中有使用中缀运算符的表达式,因此与 Java 等语言中的表达式外观相同,但这些表达式实际上是通过消息传递来求值的。与 Smalltalk 的情况一样,当编写 时a + b,它会通过将消息发送+到 引用的对象来求值a,并将对该对象的引用b作为参数传递。换句话说,a + b实现为a.+ b。
As stated previously, Ruby is a pure object-oriented programming language in the sense of Smalltalk. Virtually everything in the language is an object and all computation is accomplished through message passing. Although programs have expressions that use infix operators and therefore have the same appearance as expressions in languages like Java, those expressions actually are evaluated through message passing. As is the case with Smalltalk, when one writes a + b, it is evaluated by sending the message + to the object referenced by a, passing a reference to the object b as a parameter. In other words, a + b is implemented as a.+ b.
Ruby 类定义不同于 C++ 和 Java 等语言的类定义,因为它们是可执行的。因此,它们可以在执行期间保持打开状态。程序可以向类添加成员,次数不限,只需提供包含新成员的类的次级定义即可。在执行期间,类的当前定义是已执行的所有类定义的并集。方法定义也是可执行的,这允许程序在执行期间在方法定义的两个版本之间进行选择,只需将两个定义放在选择结构的 then 和 else 子句中即可。
Ruby class definitions differ from those of languages such as C++ and Java in that they are executable. Because of this, they are allowed to remain open during execution. A program can add members to a class any number of times, simply by providing secondary definitions of the class that include the new members. During execution, the current definition of a class is the union of all definitions of the class that have been executed. Method definitions are also executable, which allows a program to choose between two versions of a method definition during execution, simply by putting the two definitions in the then and else clause of a selection construct.
Ruby 对象是使用 创建的new,它隐式调用构造函数。Ruby 类中通常的构造函数名为initialize。子类中的构造函数可以初始化已定义 setter 的父类的数据成员。这是通过super使用初始值作为实际参数进行调用来完成的。super调用父类中与出现 调用 的方法同名的方法super。
Ruby objects are created with new, which implicitly calls a constructor. The usual constructor in a Ruby class is named initialize. A constructor in a subclass can initialize the data members of the parent class that have setters defined. This is done by calling super with the initial values as actual parameters. super calls the method in the parent class that has the same name as the method in which the call to super appears.
Ruby 类可以嵌套,但嵌套类对嵌套类的变量或方法没有特殊访问权限。
Ruby classes can be nested, but the nested class has no special access to the variables or methods of the nesting class.
Ruby 中的所有变量都是对对象的引用,并且都是无类型的。回想一下,Ruby 中所有实例变量的名称都以 at 符号 ( @) 开头。
All variables in Ruby are references to objects, and all are typeless. Recall that the names of all instance variables in Ruby begin with an at sign (@).
与其他常见编程语言明显不同的是,Ruby 中的数据访问控制与方法访问控制不同。默认情况下,所有实例数据都具有私有访问权限,并且无法更改。因此,Ruby 中的子类都不是子类型。如果需要从外部访问实例变量,则必须定义访问器方法。例如,考虑以下骨架类定义:
In a clear departure from the other common programming languages, access control in Ruby is different for data than it is for methods. All instance data has private access by default, and that cannot be changed. Therefore, no subclass in Ruby is a subtype. If external access to an instance variable is required, accessor methods must be defined. For example, consider the following skeletal class definition:
class MyClass
# A constructor
def initialize
@one = 1
@two = 2
end
# A getter for @one
def one
@one
end
# A setter for @one
def one=(my_one)
@one = my_one
end
end # of class MyClass
class MyClass
# A constructor
def initialize
@one = 1
@two = 2
end
# A getter for @one
def one
@one
end
# A setter for @one
def one=(my_one)
@one = my_one
end
end # of class MyClass
等号
附加到 setter 方法名称上意味着其变量是可赋值的。因此,所有 setter 方法的名称上都附加有等号。getter 方法的主体说明了在没有 return 语句时返回最后一个表达式的值的 Ruby 方法设计。在这种情况下,将返回one的值。@one
The equal sign
attached to the name of the setter method means that its variable is assignable. So, all setter methods have equal signs attached to their names. The body of the one getter method illustrates the Ruby design of methods returning the value of the last expression evaluated when there is no return statement. In this case, the value of @one is returned.
由于 getter 和 setter 方法非常常用,因此 Ruby 提供了创建它们的快捷方式。如果希望类具有针对两个实例变量 和 的 getter 方法,@one则@two可以使用类中的单个语句指定这些 getter:
Because getter and setter methods are so frequently needed, Ruby provides shortcuts for creating them. If one wants a class to have getter methods for the two instance variables, @one and @two, those getters can be specified with the single statement in the class:
attr_reader :one, :twoattr_reader :one, :two
attr_reader实际上是函数调用,使用:one和:two作为实际参数。在变量前加上冒号 ( :) 会导致变量名可以使用,而不是将其取消引用到它所引用的对象。传递的不是值或地址,而是变量名称的文本。这正是宏参数传递的方式。
attr_reader is actually a function call, using :one and :two as the actual parameters. Preceding a variable with a colon (:) causes the variable name to be used, rather than dereferencing it to the object to which it refers. Instead of passing a value or an address, the text of the variable’s name is passed. This is exactly how macro parameters are passed.
类似地创建 setter 的函数被称为attr_writer。此函数具有与 相同的参数配置文件attr_reader。
The function that similarly creates setters is called attr_writer. This function has the same parameter profile as attr_reader.
创建 getter 和 setter 方法的函数之所以如此命名,是因为它们为类的对象提供了协议,这些对象随后被称为属性。因此,类的属性定义了类对象的数据接口(通过访问器方法公开的数据)。
The functions for creating getter and setter methods are so named because they provide the protocol for objects of the class, which then are called attributes. So, the attributes of a class define the data interface (the data made public through accessor methods) to objects of the class.
类变量的名称前面带有两个 at 符号 ( @@),它们是类及其实例的私有变量。这种私有性无法更改。此外,与全局变量和实例变量不同,类变量必须在使用前进行初始化。
Class variables, which are specified by preceding their names with two at signs (@@), are private to the class and its instances. That privacy cannot be changed. Also, unlike global and instance variables, class variables must be initialized before they are used.
在 Ruby 中,子类使用小于号 ( <) 来定义,而不是 C++ 中的冒号。例如,
Subclasses are defined in Ruby using the less-than symbol (<), rather than the colon of C++. For example,
class MySubClass < BaseClass
class MySubClass < BaseClass
Ruby 方法访问控制的一个独特之处在于,只需调用访问控制函数,就可以在子类中更改它们。这意味着可以定义一个基类的两个子类,以便其中一个子类的对象可以访问基类中定义的方法,但另一个子类的对象则不能。此外,这允许将基类中可公开访问的方法的访问权限更改为子类中可私有访问的方法。
One distinct thing about the method access controls of Ruby is that they can be changed in a subclass, simply by calling the access control functions. This means that two subclasses of a base class can be defined so that objects of one of the subclasses can access a method defined in the base class, but objects of the other subclass cannot. Also, this allows one to change the access of a publicly accessible method in the base class to a privately accessible method in the subclass.
Ruby 对动态绑定的支持与 Smalltalk 相同。变量没有类型;相反,它们都是对任何类的对象的引用。因此,所有变量都是多态的,并且方法调用到方法的所有绑定都是动态的。
Support for dynamic binding in Ruby is the same as it is in Smalltalk. Variables are not typed; rather, they are all references to objects of any class. So, all variables are polymorphic and all bindings of method calls to methods are dynamic.
因为 Ruby 是纯粹意义上的面向对象编程语言,所以它对面向对象编程的支持显然是足够的。但是,对类成员的访问控制比 C++ 弱。Ruby 不支持抽象类或接口,尽管它的 mixin 与接口密切相关。最后,很大程度上因为 Ruby 是解释型的,所以它的执行效率远不如编译型语言。
Because Ruby is an object-oriented programming language in the purest sense, its support for object-oriented programming is obviously adequate. However, access control to class members is weaker than that of C++. Ruby does not support abstract classes or interfaces, although its mixins are closely related to interfaces. Finally, in large part because Ruby is interpreted, its execution efficiency is far worse than that of the compiled languages.
表 12.1总结了本节中语言的设计者如何处理第 12.3节中描述的设计问题 。
Table 12.1 summarizes how the designers of the languages in this section chose to deal with the design issues described in Section 12.3.
面向对象编程的语言支持至少有两个部分为语言实现者提出了有趣的问题:实例变量的存储结构和消息与方法的动态绑定。在本节中,我们将简要介绍这些内容。
There are at least two parts of language support for object-oriented programming that pose interesting questions for language implementers: storage structures for instance variables and the dynamic bindings of messages to methods. In this section, we provide a brief look at these.
在 C++ 中,类被定义为 C 的记录结构(struct)的扩展。这种相似性表明,类实例的实例变量应有一个存储结构,即记录。这种结构的形式称为类实例记录 (CIR)。CIR 的结构是静态的,因此它是在编译时构建的,并用作创建类实例数据的模板。每个类都有自己的 CIR。当发生派生时,子类的 CIR 是父类的 CIR 的副本,并在末尾添加了新实例变量的条目。
In C++, classes are defined as extensions of C’s record structures—structs. This similarity suggests a storage structure for the instance variables of class instances—that of a record. This form of this structure is called a class instance record (CIR). The structure of a CIR is static, so it is built at compile time and used as a template for the creation of the data of class instances. Every class has its own CIR. When a derivation takes place, the CIR for the subclass is a copy of that of the parent class, with entries for the new instance variables added at the end.
由于 CIR 的结构是静态的,因此可以像在记录中一样访问所有实例变量,使用从 CIR 实例开头开始的恒定偏移量。这使得这些访问与记录字段的访问一样高效。
Because the structure of the CIR is static, access to all instance variables can be done as it is in records, using constant offsets from the beginning of the CIR instance. This makes these accesses as efficient as those for the fields of records.
类中静态绑定的方法不需要包含在类的 CIR 中。但是,动态绑定的方法必须在此结构中具有条目。此类条目可以简单地包含指向方法代码的指针,该指针必须在对象创建时设置。然后可以通过 CIR 中的此指针将对方法的调用连接到相应的代码。此技术的缺点是每个实例都需要存储指向可从实例调用的所有动态绑定方法的指针。
Methods in a class that are statically bound need not be involved in the CIR for the class. However, methods that will be dynamically bound must have entries in this structure. Such entries could simply have a pointer to the code of the method, which must be set at object creation time. Calls to a method could then be connected to the corresponding code through this pointer in the CIR. The drawback to this technique is that every instance would need to store pointers to all dynamically bound methods that could be called from the instance.
请注意,可从类的实例调用的动态绑定方法列表对于该类的所有实例都是相同的。因此,此类方法的列表必须仅存储一次。因此,实例的 CIR 只需要一个指向该列表的指针即可使其找到被调用的方法。列表的存储结构通常称为虚拟方法表 (vtable)。方法调用可以表示为从 vtable 开头的偏移量。祖先类的多态变量始终引用正确类型对象的 CIR,因此可以确保获得动态绑定方法的正确版本。考虑以下 Java 示例,其中所有方法都是动态绑定的:
Notice that the list of dynamically bound methods that can be called from an instance of a class is the same for all instances of that class. Therefore, the list of such methods must be stored only once. So the CIR for an instance needs only a single pointer to that list to enable it to find called methods. The storage structure for the list is often called a virtual method table (vtable). Method calls can be represented as offsets from the beginning of the vtable. Polymorphic variables of an ancestor class always reference the CIR of the correct type object, so getting to the correct version of a dynamically bound method is assured. Consider the following Java example, in which all methods are dynamically bound:
public class A {
public int a, b;
public void draw() { . . . }
public int area() { . . . }
}
public class B extends A {
public int c, d;
public void draw() { . . . }
public void sift() { . . . }
}
public class A {
public int a, b;
public void draw() { . . . }
public int area() { . . . }
}
public class B extends A {
public int c, d;
public void draw() { . . . }
public void sift() { . . . }
}
图12.7A显示了和类的 CIRB及其 vtable 。请注意,B 的 vtable 中方法的方法指针指向 A 方法的代码。原因是 B 不会覆盖 A 的方法,因此如果 B 的客户端调用,则它是从 A 继承的方法。另一方面, B 的 vtable 中和的指针指向 B 的和。该方法在 B 中被覆盖,并在 B 中定义为附加。 areaareaareaareaareadrawsiftdrawsiftdrawsift
The CIRs for the A and B classes, along with their vtables, are shown in Figure 12.7. Notice that the method pointer for the area method in B’s vtable points to the code for A’s area method. The reason is that B does not override A’s area method, so if a client of B calls area, it is the area method inherited from A. On the other hand, the pointers for draw and sift in B’s vtable point to B’s draw and sift. The draw method is overridden in B and sift is defined as an addition in B.
多重继承使动态绑定的实现变得复杂。考虑以下三个 C++ 类定义:
Multiple inheritance complicates the implementation of dynamic binding. Consider the following three C++ class definitions:
class A {
public:
int a;
virtual void fun() { . . . }
virtual void init() { . . . }
};
class B {
public:
int b;
virtual void sum() { . . . }
};
class C : public A, public B {
public:
int c;
virtual void fun() { . . . }
virtual void dud() { . . . }
};
class A {
public:
int a;
virtual void fun() { . . . }
virtual void init() { . . . }
};
class B {
public:
int b;
virtual void sum() { . . . }
};
class C : public A, public B {
public:
int c;
virtual void fun() { . . . }
virtual void dud() { . . . }
};
类从类中C继承了变量a和方法。它重新定义了方法,尽管它的方法和父类的方法都可能通过多态变量(类型)可见。从继承了变量和方法。定义自己的变量,并定义一个未继承的方法。的 CIR必须包含 的数据、 的数据和的数据,以及访问所有可见方法的某种方式。在单继承下,CIR 将包含一个指向 vtable 的指针,该 vtable 具有所有可见方法的代码地址。然而,对于多重继承,事情就没那么简单了。CIR 中必须至少有两个不同的视图可用——每个父类一个,其中一个包含子类的视图 。在父类视图中包含子类的视图就像在单继承的实现中一样。initAfunfunAABCbsumCcdudCABCC
The C class inherits the variable a and the init method from the A class. It redefines the fun method, although both its fun and that of the parent class A are potentially visible through a polymorphic variable (of type A). From B, C inherits the variable b and the sum method. C defines its own variable, c, and defines an uninherited method, dud. A CIR for C must include A’s data, B’s data, and C’s data, as well as some means of accessing all visible methods. Under single inheritance, the CIR would include a pointer to a vtable that has the addresses of the code of all visible methods. With multiple inheritance, however, it is not that simple. There must be at least two different views available in the CIR—one for each of the parent classes, one of which includes the view for the subclass, C. This inclusion of the view of the subclass in the parent class’s view is just as in the implementation of single inheritance.
还必须有两个 vtable:一个用于A和C视图,一个用于视图。在这种情况下,B的 CIR 的第一部分可以是和视图,它以 C 的方法和从继承的方法的 vtable 指针开头,并包含从继承的数据。在的 CIR中,紧接着的是视图部分,它以的虚方法的 vtable 指针开头,后跟从继承的数据和在中定义的数据。的 CIR如图12.8所示。CCAAACBBBCC
There must also be two vtables: one for the A and C view and one for the B view. The first part of the CIR for C in this case can be the C and A view, which begins with a vtable pointer for the methods of C and those inherited from A, and includes the data inherited from A. Following this in C’s CIR is the B view part, which begins with a vtable pointer for the virtual methods of B, which is followed by the data inherited from B and the data defined in C. The CIR for C is shown in Figure 12.8.
关于反射的讨论并不适合放在面向对象章节中,但放在本书的其他章节中就更不合适了。所以,我们把它放在了这里。
A discussion of reflection is not a perfect fit into a chapter on object orientation, but it is even a worse fit into any other chapter of this book. So, this is where we put it.
一般来说,编程语言中绑定得越晚,语言就越灵活。例如,脚本语言和函数式语言中的数据类型后期绑定使其程序比静态类型语言中的程序更通用。同样,面向对象语言中方法调用到方法的动态绑定使其程序更易于维护和扩展。除其他外,反射提供了对调用代码继承层次结构之外的方法进行后期绑定的可能性。
In general, the later bindings take place in a programming language, the more flexible the language is. For example, the late binding of data types in scripting languages and functional languages allows their programs to be more generic than those in the static-typed languages. Likewise, the dynamic binding of method calls to methods that is part of the object-oriented languages allows their programs to be easier to maintain and extend. Among other things, reflection provides the possibility of late binding of calls to methods that are outside the inheritance hierarchy of the calling code.
支持反射的编程语言允许其程序在运行时访问其类型和结构,并能够动态修改其行为。 为了允许程序检查其类型和结构,编译器或解释器必须收集这些信息并提供给程序。 正如有关数据库结构的信息称为元数据一样,程序的类型和结构也称为元数据。 程序检查其元数据的过程称为自省。 程序可以通过多种不同的方式动态地修改其行为:它可以直接更改其元数据,可以使用元数据,或者可以干预程序的执行。 其中第一种很复杂;第二种不太复杂并且在各种语言中很常见;第三种通常称为干预。
A programming language that supports reflection allows its programs to have run-time access to their types and structure and to be able to dynamically modify their behavior. To allow a program to examine its types and structure, that information must be gathered by the compiler or interpreter and made available to the program. Just as information about the structure of a database is called metadata, the types and structure of a program are called metadata. The process of a program examining its metadata is called introspection. A program can modify its behavior dynamically in several different ways: it could change its metadata directly, it could use the metadata, or it could intercede in the execution of the program. The first of these is complicated; the second is less complex and is common among languages; the third often is called intercession.
反射的一些主要用途是构建软件工具。类浏览器需要枚举程序的类。可视化集成开发环境可以使用类型信息来帮助开发人员构建类型正确的代码。调试器必须能够检查类的私有字段和方法。测试系统需要能够发现类的所有方法,以确保测试数据驱动所有方法。
Some of the primary uses of reflection are in the construction of software tools. A class browser needs to enumerate the classes of a program. Visual Integrated Development Environments can use type information to assist a developer in building type-correct code. Debuggers must be able to examine private fields and methods of classes. Test systems need to be able to discover all of the methods of a class to be sure that test data drives all of them.
为了说明反射的一个相对简单和常见的用法,我们提出以下问题。动物园有一大片区域专门用于饲养鸟类。每个物种的飞行笼中都包含一个牌匾,上面提供了有关鸟类物种的一般信息。牌匾上有一个小屏幕,对该物种特别感兴趣的游客可以在此屏幕上输入自己的门票号码。在鸟类展览的出口处,游客可以再次在小屏幕上输入自己的门票号码,这会导致计算机打印出游客之前特别感兴趣的鸟类的图片。支持这些活动的计算机系统有一个对象,其中包含一个绘制为每只展出的鸟类绘制其鸟类的图片。这似乎很简单。当访客在飞行笼中选择一只鸟时,系统会将与该鸟关联的对象的引用放入列表中。在展览出口处,系统会调用访客列表中每个对象的 draw 方法。由于动物园只提供部分鸟类对象,因此该过程很复杂。其中一些是从第三方供应商处购买的,一些是由动物园捐助者捐赠的。由于鸟类对象的来源多种多样,它们没有共同的基类(Object 除外),也没有实现共同的接口,那么可以保存什么类型的引用呢?一个明显的解决方案是使每个鸟类对象成为基类的子类。基类类型的引用可以存储在列表中,并且可以使用动态绑定来调用 draw 方法。这种方法的缺点是需要修改每个鸟类类,以使新类成为共同基类的子类。如果可以简单地将新鸟类类添加到代码文件中而无需修改,那就更好了。另一个可能的解决方案是使用实例和强制类型转换来确定引用的具体类型。这会给系统添加大量代码,增加其复杂性和维护成本。更好的解决方案是使用反射实现的动态绑定。
To illustrate a relatively simple and common use of reflection, we pose the following problem. A zoo has a large area devoted to birds. The flight cage for each species includes a plaque that provides general information about the inhabitant’s species. Included on the plaque is a small screen onto which a visitor with special interest in the species can enter his or her entrance ticket number. At the exit to the bird exhibit, the visitor can again enter his or her ticket number on a small screen, which causes a computer to print pictures of the birds for which the visitor previously indicated particular interest. The computer system that supports these activities has an object that includes a method that draws a picture of its bird for each of the birds on display. This seems simple enough. When a visitor selects a bird at its flight cage, the system places a reference to an object associated with that bird in a list. At the exit of the exhibit, the system calls the draw method of each object in the visitor’s list. The process is complicated by the fact that the zoo supplies only some of the bird objects. Some of them are purchased from third-party vendors and some are donated by zoo benefactors. Because of the multiple sources of the bird objects, they do not have a common base class (other than Object) and do not implement a common interface, so what type references can be saved? One obvious solution is to make each bird object the subclass of a base class. References of the base class type could be stored in the list and dynamic binding could be used to invoke the draw methods. The drawback of this approach is that every bird class would need to be modified to make the new classes subclasses of the common base class. It would be better if the new bird classes could simply be added to a code file without modification. Another possible solution would be to use instance of and casting to determine the concrete types of the references. This would add much code to the system, increasing its complexity and cost of maintenance. A better solution is to use the dynamic binding that is possible with reflection.
Java 对反射提供有限的支持。元数据的主要类在命名空间中定义java.lang.Class。14不幸的是,这个类有一个令人困惑的名字,。JavaClass运行时系统Class为程序中的每个对象实例化一个实例。Class类提供了一组方法来检查程序对象的类型信息和成员。Class是所有反射 API 的访问点。
Java provides limited support for reflection. The primary class of the metadata is defined in the namespace, java.lang.Class.14 This class has the unfortunately confusing name, Class. The Java run-time system instantiates an instance of Class for each object in the program. The Class class provides a collection of methods to examine the type information and members of the program objects. Class is the access point for all of the reflection API.
如果程序引用了某个对象(不是原始对象),则Class可以通过调用其getClass方法来获取该对象的对象。所有类都继承getClass自Object,所有对象都继承自 。请考虑以下示例:
If the program has a reference to an object (not a primitive), the Class object of that object can be obtained by calling its getClass method. All classes inherit getClass from Object, from which all objects descend. Consider the following examples:
float[] totals = new float[100];
Class fltlist = totals.getClass();
Class stg = "hello".getClass();
float[] totals = new float[100];
Class fltlist = totals.getClass();
Class stg = "hello".getClass();
变量的值fltlist将是数组对象Class的对象totals。的值stg将是Class的对象String(因为"hello"是的实例String)。
The value of the variable fltlist will be the Class object of the totals array object. The value of stg will be the Class object of String (because "hello" is an instance of String).
如果某个类没有对象,则Class可以通过附加.class到类名称来获取其对象。例如,我们可以有以下内容:
If there is no object of a class, its Class object can be obtained through the class’ name by attaching .class to the name. For example, we could have the following:
Class stg = String.class;Class stg = String.class;
如果类没有名称,Class仍然可以通过附加到类定义来获取其对象.class。例如,考虑以下内容:
If the class has no name, its Class object can still be obtained by attaching .class to the class definition. For example, consider the following:
Class intmat = int[][].class;Class intmat = int[][].class;
修饰符.class也可以附加到原始类型。虽然float.getClass()是非法的,float.class但不是。
The .class modifier can also be attached to primitive types. Although float.getClass() is illegal, float.class is not.
有四种方法可以获取Class方法的引用。该getMethod方法搜索类以查找类中定义或由类继承的特定公共方法。该getMethods方法返回类中定义或由类继承的所有公共方法的数组。该getDeclaredMethod方法搜索类中声明的特定方法,包括私有方法。该getDeclaredMethods方法返回类中定义的所有方法。
There are four methods to get the Class of a method. The getMethod method searches a class to find a specific public method defined in the class or inherited by the class. The getMethods method returns an array of all of the public methods defined in a class or inherited by the class. The getDeclaredMethod method searches for a specific method declared in a class, including private methods. The getDeclaredMethods method returns all of the methods defined in a class.
如果Class已知某个对象的对象,并且找到了该对象的类定义的特定方法,则可以使用 的方法通过Method该方法的对象调用该invoke方法。例如,如果使用 找到Method命名的对象,则可以使用以下命令调用它:methodgetMethod
If the Class object of an object is known and a particular method defined by the class of the object is found, that method can be called through the Method object of the method with the invoke method. For example, if the Method object named method is found with getMethod, it can be called with the following:
method.invoke(...);method.invoke(...);
现在,我们可以用 Java 为第 12.6.2节 中提出的问题开发一个解决方案。此应用程序的核心是一个类,它定义了一个传递Object引用的方法。该方法确定传递的引用的类,找到draw该类的方法,并调用该方法。解决方案类使用第二个类进行测试,ReflectTest该类创建一个包含三个引用的数组Object,这些引用指向代表三种不同鸟类的类。每个类都定义了一个draw方法,当调用该方法时,会显示一条消息,表明该方法已被调用。然后,测试调用类方法,传递引用数组的元素。
We can now develop a solution in Java for the problem posed in Section 12.6.2. The heart of this application is a class that defines a method that is passed an Object reference. The method determines the class of the passed reference, finds a draw method of that class, and calls that method. The solution class is tested with a second class, ReflectTest, which creates an array of three Object references to classes that represent three different birds. Each of these defines a draw method that, when called, displays a message indicating that it was called. Then the test calls the class method, passing the elements of the array of references.
调用者方法可以引发三种不同的异常,每种异常都在方法中处理。
The caller method can raise three different exceptions, each of which is handled in the method.
// A project to illustrate dynamic method calling
// using reflection in Java
package reflect;
import java.lang.reflect.*;
// A class to test the Reflect class
// Creates three objects that represent different birds
// and calls a method that dynamically calls the draw
// methods of the three bird classes
public class ReflectTest {
public static void main(String[] args) {
Object[] birdList = new Object[3];
birdList[0] = new Bird1();
birdList[1] = new Bird2();
birdList[2] = new Bird3();
Reflect.callDraw(birdList[2]);
Reflect.callDraw(birdList[0]);
Reflect.callDraw(birdList[1]);
}
}
// A class to define the method that dynamically calls the
// methods of a passed class object
class Reflect {
public static void callDraw(Object birdObj) {
Class cls = birdObj.getClass();
try {
// Find the draw method of the given class
Method method = cls.getMethod("draw");
// Dynamically call the method
method.invoke(birdObj);
}
// In case the given class does not support draw
catch (NoSuchMethodException e) {
throw new IllegalArgumentException (
cls.getName() + "does not support draw");
}
// In case the callDraw cannot call draw
catch (IllegalAccessException e) {
throw new IllegalArgumentException (
"Insufficient access permissions to call" +
"draw in class " + cls.getName());
}
// In case draw throws an exception
catch (InvocationTargetException e) {
throw new RuntimeException(e);
}
}
}
class Bird1 {
public void draw() {
System.out.println("This is draw from Bird1");
}
}
class Bird2 {
public void draw() {
System.out.println("This is draw from Bird2");
}
}
class Bird3 {
public void draw() {
System.out.println("This is draw from Bird3");
}
]
// A project to illustrate dynamic method calling
// using reflection in Java
package reflect;
import java.lang.reflect.*;
// A class to test the Reflect class
// Creates three objects that represent different birds
// and calls a method that dynamically calls the draw
// methods of the three bird classes
public class ReflectTest {
public static void main(String[] args) {
Object[] birdList = new Object[3];
birdList[0] = new Bird1();
birdList[1] = new Bird2();
birdList[2] = new Bird3();
Reflect.callDraw(birdList[2]);
Reflect.callDraw(birdList[0]);
Reflect.callDraw(birdList[1]);
}
}
// A class to define the method that dynamically calls the
// methods of a passed class object
class Reflect {
public static void callDraw(Object birdObj) {
Class cls = birdObj.getClass();
try {
// Find the draw method of the given class
Method method = cls.getMethod("draw");
// Dynamically call the method
method.invoke(birdObj);
}
// In case the given class does not support draw
catch (NoSuchMethodException e) {
throw new IllegalArgumentException (
cls.getName() + "does not support draw");
}
// In case the callDraw cannot call draw
catch (IllegalAccessException e) {
throw new IllegalArgumentException (
"Insufficient access permissions to call" +
"draw in class " + cls.getName());
}
// In case draw throws an exception
catch (InvocationTargetException e) {
throw new RuntimeException(e);
}
}
}
class Bird1 {
public void draw() {
System.out.println("This is draw from Bird1");
}
}
class Bird2 {
public void draw() {
System.out.println("This is draw from Bird2");
}
}
class Bird3 {
public void draw() {
System.out.println("This is draw from Bird3");
}
]
该程序的输出如下:
The output of this program is as follows:
This is the draw from Bird3
This is the draw from Bird1
This is the draw from Bird2
This is the draw from Bird3
This is the draw from Bird1
This is the draw from Bird2C# 对反射的支持与 Java 类似,但有一些重要的区别。在 C# 中,与所有 .NET 语言一样,编译器将用通用中间语言 (CIL) 编写的中间代码放在程序集中,该程序集可能包含多个文件。程序集还包含程序集版本号和程序集中定义的所有类的元数据,以及它使用的所有外部类的元数据。
Support for reflection in C# is similar to that of Java, with a few important differences. In C#, as in all .NET languages, the compiler places the intermediate code, written in Common Intermediate Language (CIL), in an assembly, which could include several files. An assembly also contains an assembly version number and the metadata for all classes defined in the assembly, as well as for all external classes it uses.
在 .NET 中,使用 来代替java.lang.Class命名空间;使用来代替。使用 来代替方法,以获取实例的类。此外,.NET 语言使用 运算符代替Java 中使用的 字段。以下是上面显示的 Java 项目的 C# 版本:System.Typejava.lang.reflectSystem.ReflectiongetClassgetTypetypeof.class
Instead of the java.lang.Class namespace, System.Type is used in .NET; instead of java.lang.reflect, System.Reflection is used. Rather than the getClass method, getType is used to get the class of an instance. Also, the .NET languages use the typeof operator in place of the .class field used in Java. Following is a C# version of the Java project shown above:
using System;
using System.Reflection;
namespace TestReflect
{
// A project to illustrate dynamic method calling
// using reflection in C#
// A class to test the Reflect class
// Creates three objects that represent different birds
// and calls a method that dynamically calls the draw
// methods of the three bird classes
public class ReflectTest {
public static void Main(String[] args) {
Object[] birdList = new Object[3];
birdList[0] = new Bird1();
birdList[1] = new Bird2();
birdList[2] = new Bird3();
Reflect.callDraw(birdList[2]);
Reflect.callDraw(birdList[0]);
Reflect.callDraw(birdList[1]);
}
}
// A class to define the method that dynamically calls the
// methods of a passed class object
class Reflect {
public static void callDraw(Object birdObj) {
Type typ = birdObj.GetType();
// Find the draw method of the given class
MethodInfo method = typ.GetMethod("draw");
// Dynamically call the method
method.Invoke(birdObj, null);
}
}
class Bird1 {
public void draw() {
Console.WriteLine("This is draw from Bird1");
}
}
class Bird2 {
public void draw() {
Console.WriteLine("This is draw from Bird2");
}
}
class Bird3 {
public void draw() {
Console.WriteLine("This is draw from Bird3");
}
}
}
using System;
using System.Reflection;
namespace TestReflect
{
// A project to illustrate dynamic method calling
// using reflection in C#
// A class to test the Reflect class
// Creates three objects that represent different birds
// and calls a method that dynamically calls the draw
// methods of the three bird classes
public class ReflectTest {
public static void Main(String[] args) {
Object[] birdList = new Object[3];
birdList[0] = new Bird1();
birdList[1] = new Bird2();
birdList[2] = new Bird3();
Reflect.callDraw(birdList[2]);
Reflect.callDraw(birdList[0]);
Reflect.callDraw(birdList[1]);
}
}
// A class to define the method that dynamically calls the
// methods of a passed class object
class Reflect {
public static void callDraw(Object birdObj) {
Type typ = birdObj.GetType();
// Find the draw method of the given class
MethodInfo method = typ.GetMethod("draw");
// Dynamically call the method
method.Invoke(birdObj, null);
}
}
class Bird1 {
public void draw() {
Console.WriteLine("This is draw from Bird1");
}
}
class Bird2 {
public void draw() {
Console.WriteLine("This is draw from Bird2");
}
}
class Bird3 {
public void draw() {
Console.WriteLine("This is draw from Bird3");
}
}
}
我们的动态方法绑定的简单示例仅展示了反射的众多用途之一。
Our simple example of dynamic method binding shows just one of the many uses of reflection.
除了类的方法和字段之外,Java 和 C# 中的以下程序元素都可以使用反射来访问:类修饰符(例如 public、static 和 final)、构造函数、方法参数类型和实现的接口。此外,还可以自省类的继承路径描述。在 C# 中(而不是 Java 中),可以发现方法的形式参数名称。
In addition to the methods and fields of a class, the following program elements can be accessed with reflection in both Java and C#: class modifiers, such as public, static, and final, constructors, method parameter types, and implemented interfaces. Also, a description of the inheritance path of a class can be introspected. In C#, but not Java, the names of the formal parameters of methods can be discovered.
Java 的反射和 C# 的反射之间的一个显著区别是System.Reflection.Emit命名空间,它是 .NET 的一部分。此命名空间提供了创建 CIL 代码和用于容纳该代码的程序集的能力。Java 不提供这种能力,尽管可以使用其他供应商的工具来实现。
One significant difference between Java’s reflection and that of C# is the System.Reflection.Emit namespace, which is part of .NET. This namespace provides the ability to create CIL code and an assembly to house that code. Java provides no such capability, although it can be done with tools from other suppliers.
尽管反射为静态类型语言 Java 和 C# 增加了多种功能,但反射的用户必须意识到它的缺点:
Although reflection adds a variety of capabilities to the static-typed languages Java and C#, the user of reflection must be aware of its downsides:
使用反射几乎总是会影响性能。运行时解析类型、方法和字段不属于运行非反射代码的成本。此外,当动态解析类型时,无法对代码进行某些优化。
Performance nearly always suffers with the use of reflection. Resolving types, methods, and fields at run time are not part of the cost of running nonreflective code. Also, when types are dynamically resolved, some optimizations cannot be done on the code.
反射会暴露私有字段和方法,这违反了抽象和信息隐藏规则,还可能导致意外的副作用并对可移植性产生不利影响。
Reflection exposes private fields and methods, which violate the rules of abstraction and information hiding, and also may result in unexpected side effects and adversely affect portability.
尽管早期类型检查的优势被广泛接受,但通过反射实现的后期绑定显然抵消了这一优势。
Although the advantage of early type checking is widely accepted, the late binding that is possible with reflection obviously negates that advantage.
当代码在安全管理器下运行时,某些反射操作可能无法工作,这也使其不可移植。运行小程序就是其中一种安全环境。在大多数情况下,如果问题可以在没有反射的情况下解决,则不应使用反射。
Some reflective operations may not work when the code is run under a security manager, also making it non-portable. One such security environment is that of running applets. In most cases, if a problem can be solved without reflection, reflection should not be used.
反射是大多数动态类型语言不可或缺的一部分。在 LISP 中,反射是常规使用,并且代码的动态构造和执行并不罕见。在其他解释型语言(例如 JavaScript、Perl 和 Python)中,符号表在解释期间保留,提供所有有用的类型信息。
Reflection is an integral part of most dynamically typed languages. In LISP, reflection is routinely used and the dynamic construction and execution of code is not uncommon. In other interpreted languages, such as JavaScript, Perl, and Python, the symbol table is kept during interpretation, providing all useful type information.
例如,在 Python 中,该type方法返回给定值的类型。例如,type([7, 14, 21])is list。该isinstance方法返回如果其第一个参数具有其第二个参数中指定的类型,则返回布尔值。例如,isinstance(17, int)返回True。该callable函数用于确定表达式是否返回函数对象。该dir函数返回其参数对象的属性列表,包括数据和方法。
In Python, for example, the type method returns the type of a given value. For example, type([7, 14, 21]) is list. The isinstance method returns a Boolean value if its first parameter has the type named in its second parameter. For example, isinstance(17, int) returns True. The callable function is used to determine whether an expression returns a function object. The dir function returns the list of attributes, both data and methods, of its parameter object.
面向对象编程基于三个基本概念:抽象数据类型、继承和动态绑定。面向对象编程语言通过类、方法、对象和消息传递支持该范式。
Object-oriented programming is based on three fundamental concepts: abstract data types, inheritance, and dynamic bindinwg. Object-oriented programming languages support the paradigm with classes, methods, objects, and message passing.
本章对面向对象编程语言的讨论围绕七个设计问题展开:对象的排他性、子类和子类型、类型检查和多态性、单继承和多继承、动态绑定、对象的显式或隐式释放以及嵌套类。
The discussion of object-oriented programming languages in this chapter revolves around seven design issues: exclusivity of objects, subclasses and subtypes, type checking and polymorphism, single and multiple inheritance, dynamic binding, explicit or implicit deallocation of objects, and nested classes.
Smalltalk 是一种纯粹的面向对象语言——一切都是对象,所有计算都是通过消息传递完成的。所有类型检查和消息与方法的绑定都是动态的,所有继承都是单一的。Smalltalk 没有显式的释放操作。
Smalltalk is a pure object-oriented language—everything is an object and all computation is accomplished through message passing. All type checking and binding of messages to methods is dynamic, and all inheritance is single. Smalltalk has no explicit deallocation operation.
C++ 不仅支持数据抽象、继承,还支持消息与方法的可选动态绑定,以及 C 的所有常规功能。这意味着它有两个不同的类型系统。C++ 提供多重继承和显式对象释放。它包括对类中实体的各种访问控制,其中一些可防止子类成为子类型。构造函数和析构函数方法都可以包含在类中;两者通常都是隐式调用的。
C++ provides support for data abstraction, inheritance, and optional dynamic binding of messages to methods, along with all of the conventional features of C. This means that it has two distinct type systems. C++ provides multiple inheritance and explicit object deallocation. It includes a variety of access controls for the entities in classes, some of which prevent subclasses from being subtypes. Both constructor and destructor methods can be included in classes; both are usually implicitly called.
虽然 Smalltalk 的动态类型绑定比混合语言 C++ 提供了更多的编程灵活性,但效率却低得多。
While Smalltalk’s dynamic type binding provides somewhat more programming flexibility than the hybrid language C++, it is far less efficient.
与 C++ 不同,Java 不是混合语言;它旨在仅支持面向对象编程。Java 既有原始标量类型,也有类。所有对象都从堆中分配,并通过引用变量访问。没有显式的对象释放操作 — 使用垃圾收集。唯一的子程序是方法,只能通过对象或类调用它们。仅直接支持单继承,尽管使用接口可以实现一种多重继承。所有消息与方法的绑定都是动态的,但不能覆盖方法的情况除外。除了类之外,Java 还包括包作为第二个封装结构。
Unlike C++, Java is not a hybrid language; it is meant to support only object-oriented programming. Java has both primitive scalar types and classes. All objects are allocated from the heap and are accessed through reference variables. There is no explicit object deallocation operation—garbage collection is used. The only subprograms are methods, and they can be called only through objects or classes. Only single inheritance is directly supported, although a kind of multiple inheritance is possible using interfaces. All binding of messages to methods is dynamic, except in the case of methods that cannot be overridden. In addition to classes, Java includes packages as a second encapsulation construct.
C# 基于 C++ 和 Java,支持面向对象编程。对象可以从类或结构体实例化。结构体对象是堆栈动态的,不支持继承。派生类中的方法可以通过在base方法名称上包含来调用父类的隐藏方法。可以覆盖的方法必须标记为virtual,而覆盖方法必须标记为override。所有类(和所有基元)都派生自Object。
C#, which is based on C++ and Java, supports object-oriented programming. Objects can be instantiated from either classes or structs. The struct objects are stack dynamic and do not support inheritance. Methods in a derived class can call the hidden methods of the parent class by including base on the method name. Methods that can be overridden must be marked virtual, and the overriding methods must be marked with override. All classes (and all primitives) are derived from Object.
Ruby 是一种面向对象的脚本语言,其中所有数据都是对象。与 Smalltalk 一样,所有对象都是堆分配的,所有变量都是对对象的无类型引用。所有构造函数都命名为initialize。所有实例数据都是私有的,但可以轻松包含 getter 和 setter 方法。已提供访问方法的所有实例变量的集合构成了类的公共接口。此类实例数据称为属性。Ruby 类是动态的,因为它们是可执行的并且可以随时更改。Ruby 仅支持单继承。
Ruby is an object-oriented scripting language in which all data are objects. As with Smalltalk, all objects are heap allocated and all variables are typeless references to objects. All constructors are named initialize. All instance data are private, but getter and setter methods can be easily included. The collection of all instance variables for which access methods have been provided forms the public interface to the class. Such instance data are called attributes. Ruby classes are dynamic in the sense that they are executable and can be changed at any time. Ruby supports only single inheritance.
类的实例变量存储在 CIR 中,其结构是静态的。子类有自己的 CIR,以及其父类的 CIR。虚拟方法表支持动态绑定,该表存储指向特定方法的指针。多重继承大大增加了 CIR 和虚拟方法表的实现复杂性。
The instance variables of a class are stored in a CIR, the structure of which is static. Subclasses have their own CIRs, as well as the CIR of their parent class. Dynamic binding is supported with a virtual method table, which stores pointers to specific methods. Multiple inheritance greatly complicates the implementation of CIRs and virtual method tables.
反射是程序访问其类和类型并可能动态更改它们以影响程序行为的过程。反射的主要用途之一是构建软件工具,例如可视化程序构建工具、调试器和测试系统。类和类型信息(称为元数据)由语言的编译器或解释器收集。在 Java 中,类信息(例如类的方法)可在Class类的对象中使用。C# 中对反射的支持与 Java 类似,可在System.Reflection命名空间中使用。
Reflection is a process by which a program can access its classes and types and possibly dynamically change them to affect program behavior. One of the primary uses of reflection is in the construction of software tools, such as visual program construction tools, debuggers, and test systems. The class and type information, called metadata, is collected by the compiler or interpreter for the language. In Java, the class information, such as the methods of the class, are available in the Class object of a class. Support for reflection in C#, which is similar to that of Java, is available in the System.Reflection namespace.
描述面向对象语言的三个特征。
Describe the three characteristic features of object-oriented languages.
类变量和实例变量有什么区别?
What is the difference between a class variable and an instance variable?
什么是多重继承?
What is multiple inheritance?
什么是多态变量?
What is a polymorphic variable?
什么是覆盖方法?
What is an overriding method?
描述动态绑定比静态绑定具有巨大优势的情况。
Describe a situation where dynamic binding is a great advantage over static binding.
什么是虚方法?
What is a virtual method?
什么是抽象方法?什么是抽象类?
What is an abstract method? What is an abstract class?
简要描述本章中面向对象语言使用的七个设计问题。
Describe briefly the seven design issues used in this chapter for object-oriented languages.
什么是嵌套类?
What is a nesting class?
对象的消息协议是什么?
What is the message protocol of an object?
Smalltalk 对象从哪里分配?
From where are Smalltalk objects allocated?
解释 Smalltalk 消息如何与方法绑定。这是在什么时候发生的?
Explain how Smalltalk messages are bound to methods. When does this take place?
Smalltalk 中会进行哪些类型检查?检查在何时进行?
What type checking is done in Smalltalk? When does it take place?
Smalltalk 支持哪种继承(单一继承还是多重继承)?
What kind of inheritance, single or multiple, does Smalltalk support?
Smalltalk 对计算产生的两个最重要的影响是什么?
What are the two most important effects that Smalltalk has had on computing?
本质上,所有 Smalltalk 变量都属于单一类型。该类型是什么?
In essence, all Smalltalk variables are of a single type. What is that type?
可以从哪里分配 C++ 对象?
From where can C++ objects be allocated?
C++ 堆分配的对象如何释放?
How are C++ heap-allocated objects deallocated?
所有 C++ 子类都是子类型吗?如果是,请解释。如果不是,请解释原因?
Are all C++ subclasses subtypes? If so, explain. If not, why not?
在什么情况下 C++ 方法调用会静态绑定到某个方法?
Under what circumstances is a C++ method call statically bound to a method?
允许设计人员指定哪些方法可以静态绑定有什么缺点?
What drawback is there to allowing designers to specify which methods can be statically bound?
C++ 中的私有派生和公共派生之间有何区别?
What are the differences between private and public derivations in C++?
friendC++ 中的函数是什么?
What is a friend function in C++?
C++ 中的纯虚函数是什么?
What is a pure virtual function in C++?
在 C++ 中,如何将参数发送给超类的构造函数?
How are parameters sent to a superclass’s constructor in C++?
Smalltalk 和 C++ 之间最重要的实际区别是什么?
What is the single most important practical difference between Smalltalk and C++?
Java 的类型系统与 C++ 的类型系统有何不同?
How is the type system of Java different from that of C++?
Java 对象可以从哪里分配?
From where can Java objects be allocated?
什么是拳击?
What is boxing?
Java 对象如何被释放?
How are Java objects deallocated?
所有 Java 子类都是子类型吗?
Are all Java subclasses subtypes?
Java 中如何调用超类构造函数?
How are superclass constructors called in Java?
在什么情况下 Java 方法调用会静态绑定到某个方法?
Under what circumstances is a Java method call statically bound to a method?
C# 中的覆盖方法在语法上与 C++ 中的覆盖方法有何不同?
In what way do overriding methods in C# syntactically differ from their counterparts in C++?
在 C# 中,如何在子类中调用被重写的继承方法的父版本?
How can the parent version of an inherited method that is overridden in a subclass be called in that subclass in C#?
Ruby 如何实现原始类型,例如整数和浮点数据?
How does Ruby implement primitive types, such as those for integer and floating-point data?
Ruby 类中如何定义 getter 方法?
How are getter methods defined in a Ruby class?
Ruby 对实例变量支持哪些访问控制?
What access controls does Ruby support for instance variables?
Ruby 支持哪些方法的访问控制?
What access controls does Ruby support for methods?
所有 Ruby 子类都是子类型吗?
Are all Ruby subclasses subtypes?
Ruby 是否支持多重继承?
Does Ruby support multiple inheritance?
反射允许程序做什么?
What does reflection allow a program to do?
在反思的背景下,元数据是什么?
In the context of reflection, what is metadata?
什么是自省?
What is introspection?
什么是代祷?
What is intercession?
Java 中的哪个类存储有关程序中的类的信息?
What class in Java stores information about classes in a program?
Java 名称扩展用于什么.class用途?
For what is the Java name extension .class used?
JavagetMethods方法起什么作用?
What does the Java getMethods method do?
C# 命名空间用于什么System.Reflection.Emit用途?
For what is the C# namespace System.Reflection.Emit used?
SIMULA 67 缺少面向对象编程支持的哪些重要部分?
What important part of support for object-oriented programming is missing in SIMULA 67?
解释替代的原理。
Explain the principle of substitution.
解释创建非子类型的子类的方法。
Explain the ways subclasses can be created that are not subtypes.
比较C++和Java的动态绑定。
Compare the dynamic binding of C++ and Java.
比较C++和Java的类实体访问控制。
Compare the class entity access controls of C++ and Java.
将 C++ 的多重继承与 Java 中接口提供的多重继承进行比较。
Compare the multiple inheritance of C++ with that provided by interfaces in Java.
在哪种编程情况下,多重继承比接口具有明显的优势?
What is one programming situation where multiple inheritance has a significant advantage over interfaces?
解释通过继承可以改善的抽象数据类型的两个问题。
Explain the two problems with abstract data types that are ameliorated by inheritance.
描述子类可以对其父类进行的更改类别。
Describe the categories of changes that a subclass can make to its parent class.
解释继承的一个缺点。
Explain one disadvantage of inheritance.
解释一种语言中的所有值都是对象的优点和缺点。
Explain the advantages and disadvantages of having all values in a language be objects.
子类与其父类之间存在“is-a”关系到底意味着什么?
What exactly does it mean for a subclass to have an is-a relationship with its parent class?
描述覆盖方法的参数与被覆盖方法的参数之间的匹配程度的问题。
Describe the issue of how closely the parameters of an overriding method must match those of the method it overrides.
解释 Smalltalk 中的类型检查。
Explain type checking in Smalltalk.
Java 的设计者显然认为,为了提高效率而允许任何方法进行静态绑定是不值得的,就像 C++ 的情况一样。支持和反对 Java 设计的论据是什么?
The designers of Java obviously thought it was not worth the additional efficiency of allowing any method to be statically bound, as is the case with C++. What are the arguments for and against the Java design?
所有 Java 对象都有共同祖先的主要原因是什么?
What is the primary reason why all Java objects have a common ancestor?
finalizeJava 中子句的用途是什么?
What is the purpose of the finalize clause in Java?
如果 Java 允许堆栈动态对象和堆动态对象,会有什么好处?同时存在这两种情况的缺点是什么?
What would be gained if Java allowed stack-dynamic objects as well as heap-dynamic objects? What would be the disadvantage of having both?
C++ 抽象类和 Java 接口之间有什么区别?
What are the differences between a C++ abstract class and a Java interface?
解释为什么允许一个类在 Java 和 C# 中实现多个接口不会产生 C++ 中的多重继承所产生的相同问题。
Explain why allowing a class to implement multiple interfaces in Java and C# does not create the same problems that multiple inheritance in C++ creates.
研究并解释为什么 C# 不包含 Java 的非静态嵌套类的问题。
Study and explain the issue of why C# does not include Java’s nonstatic nested classes.
可以为抽象类定义引用变量吗?这样的变量有什么用处?
Can you define a reference variable for an abstract class? What use would such a variable have?
比较 Java 和 Ruby 中实例变量的访问控制。
Compare the access controls for instance variables in Java and Ruby.
比较 Java 和 Ruby 中实例变量的类型错误检测。
Compare the type error detection for instance variables in Java and Ruby.
解释反射的缺点。
Explain the downsides of reflection.
Rewrite the single_linked_list, stack_2, and queue_2 classes in Section 12.5.2 in Java and compare the result with the C++ version in terms of readability and ease of programming.
使用 Ruby重复编程练习 1。
Repeat Programming Exercise 1 using Ruby.
设计和实现一个 C++ 程序,该程序定义一个基类 A,该基类具有子类 B,而子类 B 本身又具有子类 C。A 类必须实现一个方法,该方法在 B 和 C 中均被重写。您还必须编写一个测试类,该类实例化 A、B 和 C,并包含对该方法的三个调用。其中一个调用必须静态绑定到 A 的方法。一个调用必须动态绑定到 B 的方法,另一个调用必须动态绑定到 C 的方法。所有方法调用都必须通过指向类 A 的指针进行。
Design and implement a C++ program that defines a base class A, which has a subclass B, which itself has a subclass C. The A class must implement a method, which is overridden in both B and C. You must also write a test class that instantiates A, B, and C and includes three calls to the method. One of the calls must be statically bound to A’s method. One call must be dynamically bound to B’s method, and one must be dynamically bound to C’s method. All of the method calls must be through a pointer to class A.
用 C++ 编写一个程序,多次调用动态绑定方法和静态绑定方法,并对这两个方法的调用进行计时。比较计时结果并计算两者所需时间的差异。解释结果。
Write a program in C++ that calls both a dynamically bound method and a statically bound method a large number of times, timing the calls to both of the two. Compare the timing results and compute the difference of the time required by the two. Explain the results.
使用 Java重复编程练习 1,强制使用静态绑定final。
Repeat Programming Exercise 1 using Java, forcing static binding with final.
本章首先介绍子程序或单元级和语句级的各种并发。其中包括对最常见的多处理器计算机体系结构的简要描述。接下来,对单元级并发进行了详细的讨论。首先介绍在讨论单元级并发的语言支持问题和挑战(特别是竞争和合作同步)之前必须理解的基本概念。接下来,介绍提供并发语言支持的设计问题。接下来详细讨论了语言支持并发的三种主要方法:信号量、监视器和消息传递。使用伪代码示例程序来演示如何使用信号量。使用 Ada 和 Java 来说明监视器;对于消息传递,使用 Ada。详细描述了支持并发的 Ada 功能。虽然重点是任务,但也讨论了受保护的对象(实际上是监视器)。然后讨论了使用 Java 和 C# 中的线程对单元级并发的支持,包括同步方法。接下来是几种函数式编程语言对并发性的简要概述。本章的最后一节简要讨论了语句级并发性,并介绍了高性能 Fortran 中为其提供的部分语言支持。
This chapter begins with introductions to the various kinds of concurrency at the subprogram, or unit level, and at the statement level. Included is a brief description of the most common kinds of multiprocessor computer architectures. Next, a lengthy discussion on unit-level concurrency is presented. This begins with a description of the fundamental concepts that must be understood before discussing the problems and challenges of language support for unit-level concurrency, specifically competition and cooperation synchronization. Next, the design issues for providing language support for concurrency are described. Following this is a detailed discussion of three major approaches to language support for concurrency: semaphores, monitors, and message passing. A pseudocode example program is used to demonstrate how semaphores can be used. Ada and Java are used to illustrate monitors; for message passing, Ada is used. The Ada features that support concurrency are described in some detail. Although tasks are the focus, protected objects (which are effectively monitors) are also discussed. Support for unit-level concurrency using threads in Java and C# is then discussed, including approaches to synchronization. This is followed by brief overviews of support for concurrency in several functional programming languages. The last section of the chapter is a brief discussion of statement-level concurrency, including an introduction to part of the language support provided for it in High-Performance Fortran.
软件执行中的并发可以发生在四个不同的级别:指令级(同时执行两个或多个机器指令)、语句级(同时执行两个或多个高级语言语句)、单元级(同时执行两个或多个子程序单元)和程序级(同时执行两个或多个程序)。由于它们不涉及语言设计问题,因此本章不讨论指令级和程序级并发。本章将讨论子程序和语句级的并发,主要关注子程序级。
Concurrency in software execution can occur at four different levels: instruction level (executing two or more machine instructions simultaneously), statement level (executing two or more high-level language statements simultaneously), unit level (executing two or more subprogram units simultaneously), and program level (executing two or more programs simultaneously). Because no language design issues are involved with them, instruction-level and program-level concurrency are not discussed in this chapter. Concurrency at both the subprogram and the statement levels is discussed, with most of the focus on the subprogram level.
乍一看,并发似乎是一个简单的概念,但它给程序员、编程语言设计者和操作系统设计者带来了重大挑战(因为对并发的大部分支持是由操作系统提供的)。
At first glance, concurrency may appear to be a simple concept, but it presents significant challenges to the programmer, the programming language designer, and the operating system designer (because much of the support for concurrency is provided by the operating system).
并发控制机制提高了编程灵活性。它们最初是为了解决操作系统面临的特定问题而发明的,但对于各种其他编程应用程序来说,它们也是必需的。最常用的程序之一是 Web 浏览器,其设计主要基于并发性。浏览器必须同时执行许多不同的功能,其中包括从 Web 服务器发送和接收数据、在屏幕上呈现文本和图像以及使用鼠标和键盘对用户操作做出反应。大多数当代浏览器使用许多当代个人计算机中的额外核心处理器来执行某些处理,例如解释客户端脚本代码。另一个例子是旨在模拟由多个并发子系统组成的实际物理系统的软件系统。对于所有这些类型的应用程序,编程语言(或库或至少操作系统)必须支持单元级并发。
Concurrent control mechanisms increase programming flexibility. They were originally invented to be used for particular problems faced in operating systems, but they are required for a variety of other programming applications. One of the most commonly used programs is Web browsers, whose design is based heavily on concurrency. Browsers must perform many different functions at the same time, among them sending and receiving data from Web servers, rendering text and images on the screen, and reacting to user actions with the mouse and the keyboard. Most contemporary browsers use the extra core processors that are part of many contemporary personal computers to perform some of their processing, for example the interpretation of client-side scripting code. Another example is the software systems that are designed to simulate actual physical systems that consist of multiple concurrent subsystems. For all of these kinds of applications, the programming language (or a library or at least the operating system) must support unit-level concurrency.
语句级并发与单元级并发有很大不同。从语言设计者的角度来看,语句级并发很大程度上是指定数据应如何分布在多个内存上以及哪些语句可以并发执行。
Statement-level concurrency is quite different from concurrency at the unit level. From a language designer’s point of view, statement-level concurrency is largely a matter of specifying how data should be distributed over multiple memories and which statements can be executed concurrently.
开发并发软件的目标是生成可扩展且可移植的并发算法。如果并发算法的执行速度随着更多处理器的出现而增加,则该算法是可扩展的。这一点很重要,因为处理器的数量有时会随着新一代机器的出现而增加。算法必须是可移植的,因为硬件的寿命相对较短。因此,软件系统不应依赖于特定的体系结构 — 也就是说,它们应该在具有不同体系结构的机器上高效运行。
The goal of developing concurrent software is to produce scalable and portable concurrent algorithms. A concurrent algorithm is scalable if the speed of its execution increases when more processors are available. This is important because the number of processors sometimes increases with the new generations of machines. The algorithms must be portable because the lifetime of hardware is relatively short. Therefore, software systems should not depend on a particular architecture—that is, they should run efficiently on machines with different architectures.
本章的目的是讨论与语言设计问题最相关的并发方面,而不是对所有并发问题(包括并发程序的开发)进行权威研究。这显然不适合一本关于编程语言的书。
The intention of this chapter is to discuss the aspects of concurrency that are most relevant to language design issues, rather than to present a definitive study of all of the issues of concurrency, including the development of concurrent programs. That would clearly be inappropriate for a book on programming languages.
许多不同的计算机体系结构都具有多个处理器,并且可以支持某种形式的并发执行。在开始讨论程序和语句的并发执行之前,我们先简要介绍一下其中一些体系结构。
A large number of different computer architectures have more than one processor and can support some form of concurrent execution. Before beginning to discuss concurrent execution of programs and statements, we briefly describe some of these architectures.
第一批具有多个处理器的计算机有一个通用处理器和一个或多个其他处理器(通常称为外围处理器),这些处理器仅用于输入和输出操作。这种架构允许那些在 20 世纪 50 年代末出现的计算机执行一个程序,同时为该程序或其他程序执行输入或输出。
The first computers that had multiple processors had one general-purpose processor and one or more other processors, often called peripheral processors, that were used only for input and output operations. This architecture allowed those computers, which appeared in the late 1950s, to execute one program while concurrently performing input or output for that program or other programs.
到 20 世纪 60 年代初,出现了具有多个完整处理器的机器。这些处理器由操作系统的作业调度程序使用,该调度程序将批处理作业队列中的单独作业分发到单独的处理器。具有这种结构的系统支持程序级并发。
By the early 1960s, there were machines that had multiple complete processors. These processors were used by the job scheduler of the operating system, which distributed separate jobs from a batch-job queue to the separate processors. Systems with this structure supported program-level concurrency.
20 世纪 60 年代中期,出现了一些拥有多个相同部分处理器的机器。这些处理器由单个指令流提供指令。例如,一些机器有两个或更多浮点乘法器,而另一些机器有两个或更多完整的浮点算术单元。这些机器的编译器需要确定哪些指令可以并发执行,并相应地安排这些指令。具有这种结构的系统支持指令级并发。
In the mid-1960s, some machines appeared that had several identical partial processors. These were fed instructions from a single instruction stream. For example, some machines had two or more floating-point multipliers, while others had two or more complete floating-point arithmetic units. The compilers for these machines were required to determine which instructions could be executed concurrently and to schedule these instructions accordingly. Systems with this structure supported instruction-level concurrency.
1966 年,Michael J. Flynn 建议根据指令和数据流是单个还是多个对计算机体系结构进行分类。这些名称从 20 世纪 70 年代到 21 世纪初被广泛使用。使用多个数据流的两类定义如下:具有多个处理器、每个处理器对不同的数据同时执行同一条指令的计算机称为单指令多数据 (SIMD) 体系结构计算机。在 SIMD 计算机中,每个处理器都有自己的本地内存。一个处理器控制其他处理器的操作。由于除控制器之外的所有处理器都同时执行同一条指令,因此软件中不需要同步。也许使用最广泛的 SIMD 机器是一类称为矢量处理器的机器。它们具有多组寄存器,用于存储矢量运算的操作数,其中同一条指令同时对整个操作数组执行。最初,最能从这种架构中获益的程序是科学计算,而科学计算通常是多处理器机器的目标。然而,SIMD 处理器现在用于各种应用领域,其中包括图形和视频处理。直到最近,大多数超级计算机都是矢量处理器。
In 1966, Michael J. Flynn suggested a categorization of computer architectures defined by whether the instruction and data streams were single or multiple. The names of these were widely used from the 1970s to the early 2000s. The two categories that used multiple data streams are defined as follows: Computers that have multiple processors that execute the same instruction simultaneously, each on different data, are called Single Instruction, Multiple Data (SIMD) architecture computers. In an SIMD computer, each processor has its own local memory. One processor controls the operation of the other processors. Because all of the processors, except the controller, execute the same instruction at the same time, no synchronization is required in the software. Perhaps the most widely used SIMD machines are a category of machines called vector processors. They have groups of registers that store the operands of a vector operation in which the same instruction is executed on the whole group of operands simultaneously. Originally, the kinds of programs that could most benefit from this architecture were in scientific computation, an area of computing that is often the target of multiprocessor machines. However, SIMD processors are now used for a variety of application areas, among them graphics and video processing. Until recently, most supercomputers were vector processors.
具有多个独立运行但其操作可以同步的处理器的计算机称为多指令多数据 (MIMD) 计算机。MIMD 计算机中的每个处理器都执行自己的指令流。MIMD 计算机可以出现在两种不同的配置中:分布式和共享内存系统。分布式 MIMD 机器(其中每个处理器都有自己的内存)可以内置在单个机箱中,也可以分布在很大的区域内。共享内存 MIMD 机器显然必须提供某种同步方法以防止内存访问冲突。即使是分布式 MIMD 机器也需要同步才能一起运行单个程序。MIMD 计算机比 SIMD 计算机更通用,支持单元级并发。本章主要关注共享内存 MIMD 计算机的语言设计,这种计算机通常称为多处理器。
Computers that have multiple processors that operate independently but whose operations can be synchronized are called Multiple Instruction, Multiple Data (MIMD) computers. Each processor in an MIMD computer executes its own instruction stream. MIMD computers can appear in two distinct configurations: distributed and shared memory systems. The distributed MIMD machines, in which each processor has its own memory, can be either built in a single chassis or distributed, perhaps over a large area. The shared-memory MIMD machines obviously must provide some means of synchronization to prevent memory access clashes. Even distributed MIMD machines require synchronization to operate together on single programs. MIMD computers, which are more general than SIMD computers, support unit-level concurrency. The primary focus of this chapter is on language design for shared memory MIMD computers, which are often called multiprocessors.
随着功能强大但价格低廉的单片计算机的出现,人们可以将大量微处理器连接到单个机箱内的物理小型网络中。这类计算机通常使用现成的微处理器,可从许多不同的制造商处获得。
With the advent of powerful but low-cost single-chip computers, it became possible to have large numbers of these microprocessors connected into physically small networks within a single chassis. These kinds of computers, which often use off-the-shelf microprocessors, are available from a number of different manufacturers.
软件未能更快发展以利用并发机器的一个重要原因是处理器的功能不断增强。使用并发机器的最重要动机之一是提高计算速度。然而,两个硬件因素结合起来提供了更快的计算速度,而不需要对软件系统的架构进行任何改变。首先,处理器时钟频率随着每一代新处理器的出现而变得更快(每 18 个月左右就会出现一代新处理器)。其次,处理器架构中已经内置了几种不同类型的并发。其中包括将指令和数据从内存流水线传输到处理器(指令是并行性包括:在执行当前指令时提取和解码后续执行的指令(在执行当前指令时提取和解码后续执行的指令)、使用单独的指令和数据行、预取指令和数据以及并行执行算术运算。所有这些统称为隐藏并发。执行速度的提高导致生产力大幅提高,而无需软件开发人员开发并发软件系统。
One important reason why software has not evolved faster to make use of concurrent machines is that the power of processors has continually increased. One of the strongest motivations to use concurrent machines is to increase the speed of computation. However, two hardware factors have combined to provide faster computation, without requiring any change in the architecture of software systems. First, processor clock rates have become faster with each new generation of processors (the generations have appeared roughly every 18 months). Second, several different kinds of concurrency have been built into the processor architectures. Among these are the pipelining of instructions and data from the memory to the processor (instructions are fetched and decoded for future execution while the current instruction is being executed), the use of separate lines for instructions and data, prefetching of instructions and data, and parallelism in the execution of arithmetic operations. All of these are collectively called hidden concurrency. The result of the increases in execution speed is that there have been great productivity gains without requiring software developers to produce concurrent software systems.
然而,情况已经发生了变化。单个处理器速度大幅提升的时代即将结束。计算能力的大幅提升现在源于处理器数量的增加,例如谷歌和亚马逊运行的大型服务器系统和科学研究应用程序。许多其他大型计算任务现在在具有大量相对较小的处理器的机器上运行。
However, the situation has changed. The end of the sequence of significant increases in the speed of individual processors is now near. Significant increases in computing power now result from increases in the number of processors, for example large server systems like those run by Google and Amazon and scientific research applications. Many other large computing tasks are now run on machines with large numbers of relatively small processors.
计算硬件的另一项最新进展是开发单芯片上的多处理器,例如 Intel Core Duo 和 Core Quad 芯片,这给软件开发人员带来了更大的压力,迫使他们更多地利用可用的多处理器机器。如果他们不这样做,并发硬件将被浪费,并且无法实现显著的生产力提升。
Another recent advance in computing hardware was the development of multiple processors on a single chip, such as with the Intel Core Duo and Core Quad chips, which is putting more pressure on software developers to make more use of the available multiple processor machines. If they do not, the concurrent hardware will be wasted and significant productivity gains will not be realized.
并发单元控制有两种不同的类别。最自然的并发类别是,假设有多个处理器可用,同一程序中的多个程序单元实际上同时执行。这就是物理并发。稍微放宽这种并发概念,允许程序员和应用程序软件假设有多个处理器提供实际的并发,而事实上程序的实际执行是在单个处理器上以交错的方式进行的。这就是逻辑并发。从程序员和语言设计者的角度来看,逻辑并发与物理并发相同。语言实现者的任务是使用底层操作系统的功能将逻辑并发映射到主机硬件。逻辑并发和物理并发都允许将并发概念用作程序设计方法。在本章的其余部分,讨论将适用于物理和逻辑并发。
There are two distinct categories of concurrent unit control. The most natural category of concurrency is that in which, assuming that more than one processor is available, several program units from the same program literally execute simultaneously. This is physical concurrency. A slight relaxation of this concept of concurrency allows the programmer and the application software to assume that there are multiple processors providing actual concurrency, when in fact the actual execution of programs is taking place in interleaved fashion on a single processor. This is logical concurrency. From the programmer’s and language designer’s points of view, logical concurrency is the same as physical concurrency. It is the language implementor’s task, using the capabilities of the underlying operating system, to map the logical concurrency to the host hardware. Both logical and physical concurrencies allow the concept of concurrency to be used as a program design methodology. For the remainder of this chapter, the discussion will apply to both physical and logical concurrencies.
一种可视化程序执行流程的有用技术是想象一个线程位于程序源文本的语句上。特定执行中到达的每个语句都由代表该执行的线程覆盖。通过源程序直观地跟踪线程可以追踪程序可执行版本的执行流程。当然,除了最简单的程序之外,线程遵循的路径非常复杂,无法通过视觉跟踪。正式地,程序中的控制线程是控制流经程序时到达的程序点序列。
One useful technique for visualizing the flow of execution through a program is to imagine a thread laid on the statements of the source text of the program. Every statement reached on a particular execution is covered by the thread representing that execution. Visually following the thread through the source program traces the execution flow through the executable version of the program. Of course, in all but the simplest of programs, the thread follows a highly complex path that would be impossible to follow visually. Formally, a thread of control in a program is the sequence of program points reached as control flows through the program.
具有协程(参见第9章 )但没有并发子程序的程序,尽管有时被称为准并发程序,但它们只有一个控制线程。以物理并发方式执行的程序可以有多个控制线程。每个处理器可以执行其中一个线程。尽管逻辑上并发的程序执行实际上可能只有一个控制线程,但只能通过想象它们具有多个控制线程来设计和分析此类程序。设计为具有多个控制线程的程序被称为多线程程序。当多线程程序在单处理器计算机上执行时,其线程将映射到单个线程上。在这种情况下,它变成一个虚拟多线程程序。
Programs that have coroutines (see Chapter 9) but no concurrent subprograms, though they are sometimes called quasi-concurrent, have a single thread of control. Programs executed with physical concurrency can have multiple threads of control. Each processor can execute one of the threads. Although logically concurrent program execution may actually have only a single thread of control, such programs can be designed and analyzed only by imagining them as having multiple threads of control. A program designed to have more than one thread of control is said to be multithreaded. When a multithreaded program executes on a single-processor machine, its threads are mapped onto a single thread. It becomes, in this scenario, a virtually multithreaded program.
语句级并发是一个相对简单的概念。在语句级并发的常见用法中,包含对数组元素进行操作的语句的循环会被展开,以便处理可以分布在多个处理器上。例如,执行 500 次重复并包含对 500 个数组元素之一进行操作的语句的循环可以展开,以便 10 个不同的处理器中的每一个都可以同时处理 50 个数组元素。
Statement-level concurrency is a relatively simple concept. In a common use of statement-level concurrency, loops that include statements that operate on array elements are unwound so that the processing can be distributed over multiple processors. For example, a loop that executes 500 repetitions and includes a statement that operates on one of 500 array elements may be unwound so that each of 10 different processors can simultaneously process 50 of the array elements.
设计并发软件系统至少有四个不同的原因。第一个原因是程序在具有多个处理器的机器上的执行速度。这些机器提供了一种提高程序执行速度的有效方法,前提是程序被设计为利用并发硬件。现在有大量安装的多处理器计算机,包括过去几年出售的许多个人计算机。不利用这种硬件能力是一种浪费。
There are at least four different reasons to design concurrent software systems. The first reason is the speed of execution of programs on machines with multiple processors. These machines provide an effective way of increasing the execution speed of programs, provided that the programs are designed to make use of the concurrent hardware. There are now a large number of installed multiple-processor computers, including many of the personal computers sold in the last few years. It is wasteful not to use this hardware capability.
第二个原因是,即使一台机器只有一个处理器,使用并发执行编写的程序也可以比为顺序(非并发)执行编写的相同程序更快。实现这一点的前提是程序不受计算限制(顺序版本不会充分利用处理器)。
The second reason is that even when a machine has just one processor, a program written to use concurrent execution can be faster than the same program written for sequential (nonconcurrent) execution. The requirement for this to happen is that the program is not compute bound (the sequential version does not fully utilize the processor).
第三个原因是并发提供了一种不同的方法来概念化问题的程序解决方案。许多问题领域自然地适合并发,就像递归是设计某些问题解决方案的自然方法一样。此外,许多程序都是为模拟物理实体和活动而编写的。在许多情况下,被模拟的系统包含多个实体,并且这些实体同时执行它们所做的所有操作 — 例如,在受控空域飞行的飞机、通信网络中的中继站以及工厂中的各种机器。必须使用使用并发的软件来准确地模拟此类系统。
The third reason is that concurrency provides a different method of conceptualizing program solutions to problems. Many problem domains lend themselves naturally to concurrency in much the same way that recursion is a natural way to design solutions to some problems. Also, many programs are written to simulate physical entities and activities. In many cases, the system being simulated includes more than one entity, and the entities do whatever they do simultaneously—for example, aircraft flying in a controlled airspace, relay stations in a communications network, and the various machines in a factory. Software that uses concurrency must be used to simulate such systems accurately.
使用并发的第四个原因是编写分布在多台机器上的应用程序,无论是本地还是通过互联网。许多机器(例如汽车)都有多台内置计算机,每台计算机都专用于某项特定任务。在许多情况下,这些计算机集合必须同步其程序执行。互联网游戏是分布在多个处理器上的软件的另一个例子。
The fourth reason for using concurrency is to program applications that are distributed over several machines, either locally or through the Internet. Many machines, for example cars, have more than one built-in computer, each of which is dedicated to some specific task. In many cases, these collections of computers must synchronize their program executions. Internet games are another example of software that is distributed over multiple processors.
并发性现在已用于众多日常计算任务。Web 服务器同时处理文档请求。Web 浏览器现在使用辅助核心处理器来运行图形处理并解释嵌入在文档中的编程代码。在每个操作系统中,始终有许多并发进程在执行,管理资源、从键盘获取输入、显示程序输出以及读取和写入外部存储设备。简而言之,并发性已成为计算中无处不在的一部分。
Concurrency is now used in numerous everyday computing tasks. Web servers process document requests concurrently. Web browsers now use secondary core processors to run graphic processing and to interpret programming code embedded in documents. In every operating system there are many concurrent processes being executed at all times, managing resources, getting input from keyboards, displaying output from programs, and reading and writing external memory devices. In short, concurrency has become a ubiquitous part of computing.
在考虑语言对并发的支持之前,必须了解并发的基本概念及其实用性要求。本节将介绍这些主题。
Before language support for concurrency can be considered, one must understand the underlying concepts of concurrency and the requirements for it to be useful. These topics are covered in this section.
任务是程序的一个单元,类似于子程序,可以与同一程序的其他单元并发执行。程序中的每个任务都可以支持一个控制线程。任务有时称为进程。在某些语言中,例如 Java 和 C#,某些方法充当任务。这些方法在称为线程的对象中执行。
A task is a unit of a program, similar to a subprogram, that can be in concurrent execution with other units of the same program. Each task in a program can support one thread of control. Tasks are sometimes called processes. In some languages, for example Java and C#, certain methods serve as tasks. Such methods are executed in objects called threads.
任务的三个特点使其区别于子程序。首先,任务可以隐式启动,而子程序必须显式调用。其次,当程序单元调用任务时,在某些情况下,它无需等待任务完成执行即可继续执行。第三,当任务执行完成后,控制权可能会或可能不会返回启动该执行的单元。
Three characteristics of tasks distinguish them from subprograms. First, a task may be implicitly started, whereas a subprogram must be explicitly called. Second, when a program unit invokes a task, in some cases it need not wait for the task to complete its execution before continuing its own. Third, when the execution of a task is completed, control may or may not return to the unit that started that execution.
任务分为两大类:重量级任务和轻量级任务。简而言之,重量级任务在其自己的地址空间中执行。轻量级任务都在同一地址空间中运行。轻量级任务比重量级任务更容易实现。此外,轻量级任务比重量级任务更高效,因为管理其执行所需的工作量更少。
Tasks fall into two general categories: heavyweight and lightweight. Simply stated, a heavyweight task executes in its own address space. Lightweight tasks all run in the same address space. It is easier to implement lightweight tasks than heavyweight tasks. Furthermore, lightweight tasks can be more efficient than heavyweight tasks, because less effort is required to manage their execution.
任务可以通过共享非局部变量、消息传递或参数与其他任务通信。如果任务不以任何方式与程序中的其他任务通信或影响其执行,则称其为不相交的。由于任务通常协同工作以创建模拟或解决问题,因此不相交,因此它们必须使用某种形式的通信来同步其执行或共享数据或两者兼而有之。
A task can communicate with other tasks through shared nonlocal variables, through message passing, or through parameters. If a task does not communicate with or affect the execution of any other task in the program in any way, it is said to be disjoint. Because tasks often work together to create simulations or solve problems and therefore are not disjoint, they must use some form of communication to either synchronize their executions or share data or both.
同步是一种控制任务执行顺序的机制。任务共享数据时需要两种同步:合作和竞争。当任务必须等待任务完成某些特定活动才能开始或继续执行时,任务与任务之间需要合作同步。当两个任务都需要使用某些不能同时使用的资源时,两个任务之间需要竞争同步。具体而言,如果任务在访问时需要访问共享数据位置,则任务必须等待任务完成对的处理。因此,对于合作同步,任务可能需要等待其正确操作所依赖的特定处理完成,而对于竞争同步,任务可能需要等待当前在特定共享数据上发生的任何任务完成任何其他处理。ABABAAxBxABx
Synchronization is a mechanism that controls the order in which tasks execute. Two kinds of synchronization are required when tasks share data: cooperation and competition. Cooperation synchronization is required between task A and task B when task A must wait for task B to complete some specific activity before task A can begin or continue its execution. Competition synchronization is required between two tasks when both require the use of some resource that cannot be simultaneously used. Specifically, if task A needs to access shared data location x while task B is accessing x, task A must wait for task B to complete its processing of x. So, for cooperation synchronization, tasks may need to wait for the completion of specific processing on which their correct operation depends, whereas for competition synchronization, tasks may need to wait for the completion of any other processing by any task currently occurring on specific shared data.
一种简单的合作同步形式可以通过一个称为生产者-消费者问题的常见问题来说明。该问题起源于操作系统的开发,其中一个程序单元生产一些数据值或资源,而另一个程序单元使用它。生产的资料通常由生产单元放置在存储缓冲区中,并由消费单元从该缓冲区中移除。存储到缓冲区和从缓冲区中移除的顺序必须同步。如果缓冲区为空,则不得允许消费者单元从缓冲区中获取数据。同样,如果缓冲区已满,也不能允许生产者单元将新数据放入缓冲区。这是一个合作同步问题,因为如果要正确使用缓冲区,共享数据结构的用户必须合作。
A simple form of cooperation synchronization can be illustrated by a common problem called the producer-consumer problem. This problem originated in the development of operating systems, in which one program unit produces some data value or resource and another uses it. Produced data are usually placed in a storage buffer by the producing unit and removed from that buffer by the consuming unit. The sequence of stores to and removals from the buffer must be synchronized. The consumer unit must not be allowed to take data from the buffer if the buffer is empty. Likewise, the producer unit cannot be allowed to place new data in the buffer if the buffer is full. This is a problem of cooperation synchronization because the users of the shared data structure must cooperate if the buffer is to be used correctly.
竞争同步可防止两个任务同时访问共享数据结构,这种情况可能会破坏共享数据的完整性。要提供竞争同步,必须保证对共享数据的互斥访问。
Competition synchronization prevents two tasks from accessing a shared data structure at exactly the same time—a situation that could destroy the integrity of that shared data. To provide competition synchronization, mutually exclusive access to the shared data must be guaranteed.
为了阐明竞争问题,请考虑以下场景:假设任务A具有语句TOTAL += 1,其中TOTAL是共享整数变量。此外,假设任务B具有语句TOTAL *= 2。任务A和任务可以同时B尝试更改。TOTAL
To clarify the competition problem, consider the following scenario: Suppose task A has the statement TOTAL += 1, where TOTAL is a shared integer variable. Furthermore, suppose task B has the statement TOTAL *= 2. Task A and task B could try to change TOTAL at the same time.
在机器语言层面,每个任务可以TOTAL通过以下三步过程完成其操作:
At the machine language level, each task may accomplish its operation on TOTAL with the following three-step process:
获取 的值TOTAL。
Fetch the value of TOTAL.
执行算术运算。
Perform the arithmetic operation.
将新值放回原处TOTAL。
Put the new value back in TOTAL.
A如果没有竞争同步,给定前面描述的由任务和B对执行的操作TOTAL,可能会产生四个不同的值,具体取决于操作步骤的顺序。假设在或尝试修改之前TOTAL具有值。如果任务在任务开始之前完成其操作,则值为,这里假设它是正确的。但是,如果和都在任一任务将其新值放回之前获取了值,则结果将不正确。如果先将其值放回,则的值将为。这种情况如图13.1所示。如果先将其值放回,则的值将为。最后,如果在任务开始之前完成其操作,则值将为。导致这些问题的情况有时称为竞争条件,因为两个或多个任务正在竞相使用共享资源,而程序的行为取决于哪个任务先到达(并赢得竞争)。现在应该清楚竞争同步的重要性了。3ABAB8ABTOTALATOTAL6 BTOTAL4BA7
Without competition synchronization, given the previously described operations performed by tasks A and B on TOTAL, four different values could result, depending on the order of the steps of the operations. Assume TOTAL has the value 3 before either A or B attempts to modify it. If task A completes its operation before task B begins, the value will be 8, which is assumed here to be correct. But if both A and B fetch the value of TOTAL before either task puts its new value back, the result will be incorrect. If A puts its value back first, the value of TOTAL will be 6. This case is shown in Figure 13.1. If B puts its value back first, the value of TOTAL will be 4. Finally, if B completes its operation before task A begins, the value will be 7. A situation that leads to these problems is sometimes called a race condition, because two or more tasks are racing to use the shared resource and the behavior of the program depends on which task arrives first (and wins the race). The importance of competition synchronization should now be clear.
提供对共享资源的互斥访问(以支持竞争同步)的一种通用方法是将资源视为任务可以拥有的东西,并且一次只允许一个任务拥有它。要获得共享资源的所有权,任务必须请求它。只有在没有其他任务拥有所有权时才会授予所有权。当一个任务拥有资源时,所有其他任务都无法访问该资源。当一个任务用完它拥有的共享资源时,它必须放弃该资源,以便其他任务可以使用它。
One general method for providing mutually exclusive access (to support competition synchronization) to a shared resource is to consider the resource to be something that a task can possess and allow only a single task to possess it at a time. To gain possession of a shared resource, a task must request it. Possession will be granted only when no other task has possession. While a task possesses a resource, all other tasks are prevented from having access to that resource. When a task is finished with a shared resource that it possesses, it must relinquish that resource so it can be made available to other tasks.
提供对共享资源的互斥访问的三种方法是信号量(在第 13.3节中讨论)、监视器(在 第 13.4节 中讨论)和消息传递(在第 13.5节 中讨论)。
Three methods of providing for mutually exclusive access to a shared resource are semaphores, which are discussed in Section 13.3; monitors, which are discussed in Section 13.4; and message passing, which is discussed in Section 13.5.
同步机制必须能够延迟任务执行。同步对任务施加了执行顺序,这些延迟强制执行任务。要了解任务在其生命周期中会发生什么,我们必须考虑如何控制任务执行。无论一台机器是只有一个处理器还是多个处理器,任务数量总是有可能多于处理器数量。一个称为调度程序的运行时系统程序管理任务之间的处理器共享。如果从未发生过任何中断,并且所有任务都具有相同的优先级,则调度程序可以简单地为每个任务分配一个时间片,例如 0.1 秒,当轮到任务时,调度程序可以让它在处理器上执行这段时间。当然,有几个事件使情况变得复杂,例如,同步和输入或输出操作的任务延迟。由于输入和输出操作相对于处理器的速度非常慢,因此不允许任务在等待此类操作完成时保留处理器。
Mechanisms for synchronization must be able to delay task execution. Synchronization imposes an order of execution on tasks that is enforced with these delays. To understand what happens to tasks through their lifetimes, we must consider how task execution is controlled. Regardless of whether a machine has a single processor or more than one, there is always the possibility of there being more tasks than there are processors. A run-time system program called a scheduler manages the sharing of processors among the tasks. If there were never any interruptions and tasks all had the same priority, the scheduler could simply give each task a time slice, such as 0.1 second, and when a task’s turn came, the scheduler could let it execute on a processor for that amount of time. Of course, there are several events that complicate this, for example, task delays for synchronization and for input or output operations. Because input and output operations are very slow relative to the processor’s speed, a task is not allowed to keep a processor while it waits for completion of such an operation.
任务可以处于几种不同的状态:
Tasks can be in several different states:
新建:当任务已创建但尚未开始执行时,它处于新状态。
New: A task is in the new state when it has been created but has not yet begun its execution.
就绪:就绪任务已准备好运行,但当前未运行。要么调度程序尚未为其分配处理器时间,要么之前已运行但以第 4 段中描述的方式之一被阻止本节内容。准备运行的任务存储在一个队列中,该队列通常称为任务就绪队列。
Ready: A ready task is ready to run but is not currently running. Either it has not been given processor time by the scheduler, or it had run previously but was blocked in one of the ways described in Paragraph 4 of this subsection. Tasks that are ready to run are stored in a queue that is often called the task-ready queue.
运行:运行任务是当前正在执行的任务;也就是说,它有一个处理器并且其代码正在执行。
Running: A running task is one that is currently executing; that is, it has a processor and its code is being executed.
阻塞:阻塞的任务一直在运行,但执行被几种不同的事件之一中断,其中最常见的是输入或输出操作。除了输入和输出之外,一些语言还为用户程序提供操作来指定阻塞任务。
Blocked: A task that is blocked has been running, but that execution was interrupted by one of several different events, the most common of which is an input or output operation. In addition to input and output, some languages provide operations for the user program to specify that a task be blocked.
死亡:死亡任务在任何意义上都不再处于活动状态。当任务执行完成或被程序明确终止时,任务即死亡。
Dead: A dead task is no longer active in any sense. A task dies when its execution is completed or it is explicitly killed by the program.
图 13.2显示了任务状态的流程图。
A flow diagram of the states of a task is shown in Figure 13.2.
任务执行中的一个重要问题是:当当前正在运行的任务被阻塞或其时间片已到期时,如何选择将就绪任务转移到运行状态?有几种不同的算法已使用一些策略进行选择,其中一些策略基于可指定的优先级。执行选择的算法在调度程序中实现。
One important issue in task execution is the following: How is a ready task chosen to move to the running state when the task currently running has become blocked or whose time slice has expired? Several different algorithms have been used for making this choice, some based on specifiable priority levels. The algorithm that does the choosing is implemented in the scheduler.
与任务的并发执行和共享资源的使用相关的是活跃性的概念。在顺序程序环境中,如果程序持续执行并最终完成,则程序具有活跃性特征。更笼统地说,活跃性意味着如果某个事件(例如程序完成)应该发生,那么它最终会发生。也就是说,不断取得进展。在并发环境中,使用共享资源时,任务的活跃性可能不复存在,这意味着程序无法继续,因此永远不会终止。
Associated with the concurrent execution of tasks and the use of shared resources is the concept of liveness. In the environment of sequential programs, a program has the liveness characteristic if it continues to execute, eventually leading to completion. In more general terms, liveness means that if some event—say, program completion—is supposed to occur, it will occur, eventually. That is, progress is continually made. In a concurrent environment and with shared resources, the liveness of a task can cease to exist, meaning that the program cannot continue and thus will never terminate.
例如,假设任务A和任务B都需要共享资源X和Y来完成其工作。此外,假设任务A获得了 的所有权X,任务B获得了 的所有权Y。在执行某些操作后,任务A需要资源Y才能继续,因此它请求Y但必须等到B释放它。同样,任务B请求X但必须等到A释放它。两者都不会放弃其拥有的资源,因此两者都失去了活性,从而保证程序的执行永远不会正常完成。这种特殊的活性丧失称为死锁。死锁严重威胁程序的可靠性,因此在语言和程序设计中都需要认真考虑如何避免死锁。
For example, suppose task A and task B both need the shared resources X and Y to complete their work. Furthermore, suppose that task A gains possession of X and task B gains possession of Y. After some execution, task A needs resource Y to continue, so it requests Y but must wait until B releases it. Likewise, task B requests X but must wait until A releases it. Neither relinquishes the resource it possesses, and as a result, both lose their liveness, guaranteeing that execution of the program will never complete normally. This particular kind of loss of liveness is called deadlock. Deadlock is a serious threat to the reliability of a program, and therefore its avoidance demands serious consideration in both language and program design.
我们现在准备讨论一些提供并发单元控制的语言机制。
We are now ready to discuss some of the linguistic mechanisms for providing concurrent unit control.
在某些情况下,并发是通过库实现的。其中包括 OpenMP,这是一个应用程序编程接口,用于支持各种平台上的 C、C++ 和 Fortran 中的共享内存多处理器编程。我们对本书的兴趣当然是语言对并发的支持。许多语言都被设计为支持并发,从 20 世纪 60 年代中期的 PL/I 开始,包括当代语言 Ada 95、Java、C#、F#、Python 和Ruby。1
In some cases, concurrency is implemented through libraries. Among these is OpenMP, an applications programming interface to support shared memory multiprocessor programming in C, C++, and Fortran on a variety of platforms. Our interest in this book, of course, is language support for concurrency. A number of languages have been designed to support concurrency, beginning with PL/I in the mid-1960s and including the contemporary languages Ada 95, Java, C#, F#, Python, and Ruby.1
语言支持并发性的最重要的设计问题已经详细讨论过了:竞争与合作同步。除此之外,还有一些次要的设计问题。其中最突出的是应用程序如何影响任务调度。此外,还有任务如何以及何时开始和结束执行,以及如何以及何时创建任务的问题。
The most important design issues for language support for concurrency have already been discussed at length: competition and cooperation synchronization. In addition to these, there are several design issues of secondary importance. Prominent among them is how an application can influence task scheduling. Also, there are the issues of how and when tasks start and end their executions, and how and when they are created.
请记住,我们对并发性的讨论是故意不完整的,并且只讨论了与并发支持相关的最重要的语言设计问题。
Keep in mind that our discussion of concurrency is intentionally incomplete, and only the most important of the language design issues related to support for concurrency are discussed.
以下各节讨论了解决并发设计问题的三种替代方法:信号量、监视器和消息传递。
The following sections discuss three alternative approaches to the design issues for concurrency: semaphores, monitors, and message passing.
信号量是一种简单的机制,可用于提供任务同步。尽管信号量是提供同步的早期方法,但它们仍在使用,无论是在当代语言中还是在基于库的并发支持系统中。在以下段落中,我们将描述信号量并讨论如何将它们用于此目的。
A semaphore is a simple mechanism that can be used to provide synchronization of tasks. Although semaphores are an early approach to providing synchronization, they are still used, both in contemporary languages and in library-based concurrency support systems. In the following paragraphs, we describe semaphores and discuss how they can be used for this purpose.
为了通过对共享数据结构的互斥访问来实现竞争同步,Edsger Dijkstra 于 1965 年发明了信号量(Dijkstra, 1968b)。信号量还可用于提供合作同步。
In an effort to provide competition synchronization through mutually exclusive access to shared data structures, Edsger Dijkstra devised semaphores in 1965 (Dijkstra, 1968b). Semaphores can also be used to provide cooperation synchronization.
为了提供对数据结构的有限访问,可以在访问该结构的代码周围放置保护。保护是一种语言设备,它允许仅在指定条件为真时执行受保护的代码。因此,可以使用保护来一次只允许一个任务访问特定的共享数据结构。信号量是保护的一种实现。具体来说,信号量是一种由整数和存储任务描述符的队列组成的数据结构。任务描述符是一种数据结构,它存储有关任务执行状态的所有相关信息。
To provide limited access to a data structure, guards can be placed around the code that accesses the structure. A guard is a linguistic device that allows the guarded code to be executed only when a specified condition is true. So, a guard can be used to allow only one task to access a particular shared data structure at a time. A semaphore is an implementation of a guard. Specifically, a semaphore is a data structure that consists of an integer and a queue that stores task descriptors. A task descriptor is a data structure that stores all of the relevant information about the execution state of a task.
保护机制的一个组成部分是确保所有试图执行受保护代码的程序最终都会发生。典型的方法是当无法授予访问权限或将访问权限存储在任务描述符队列中时,发出访问请求,然后允许它们离开并执行受保护的代码。这就是信号量必须同时具有计数器和任务描述符队列的原因。
An integral part of a guard mechanism is a procedure for ensuring that all attempted executions of the guarded code eventually take place. The typical approach is to have requests for access that occur when access cannot be granted or stored in the task descriptor queue, from which they are later allowed to leave and execute the guarded code. This is the reason a semaphore must have both a counter and a task descriptor queue.
信号量提供的唯一两个操作最初由 Dijkstra 命名为 P 和 V,取自荷兰语单词passeren(传递)和vrygeren(释放)(Andrews and Schneider,1983 年)。在本节的其余部分,我们将分别将它们称为wait和release 。
The only two operations provided for semaphores were originally named P and V by Dijkstra, after the two Dutch words passeren (to pass) and vrygeren (to release) (Andrews and Schneider, 1983). We will refer to these as wait and release, respectively, in the remainder of this section.
在本章的大部分内容中,我们使用生产者和消费者使用的共享缓冲区的示例来说明提供合作和竞争同步的不同方法。对于合作同步,这样的缓冲区必须有某种方式来记录缓冲区中的空位置数和已填充位置数(以防止缓冲区下溢和上溢)。信号量的计数器组件可用于此目的。一个信号量变量(例如)可以emptyspots使用其计数器来维护生产者和消费者使用的共享缓冲区中的空位置数,另一个信号量变量(例如)fullspots可以使用其计数器来维护缓冲区中的已填充位置数。这些信号量的队列可以存储被迫等待访问缓冲区的任务的描述符。队列emptyspots可以存储等待缓冲区中可用位置的生产者任务;队列fullspots可以存储等待将值放入缓冲区的消费者任务。
Through much of this chapter, we use the example of a shared buffer used by producers and consumers to illustrate the different approaches to providing cooperation and competition synchronization. For cooperation synchronization, such a buffer must have some way of recording both the number of empty positions and the number of filled positions in the buffer (to prevent buffer underflow and overflow). The counter component of a semaphore can be used for this purpose. One semaphore variable—for example, emptyspots—can use its counter to maintain the number of empty locations in a shared buffer used by producers and consumers, and another—say, fullspots—can use its counter to maintain the number of filled locations in the buffer. The queues of these semaphores can store the descriptors of tasks that have been forced to wait for access to the buffer. The queue of emptyspots can store producer tasks that are waiting for available positions in the buffer; the queue of fullspots can store consumer tasks waiting for values to be placed in the buffer.
我们的示例缓冲区被设计为抽象数据类型,其中所有数据都通过子程序进入缓冲区DEPOSIT,所有数据都通过子程序离开缓冲区FETCH。DEPOSIT子程序只需检查emptyspots信号量以查看是否有空位置。如果至少有一个,它可以继续执行DEPOSIT,这必然会产生减少计数器的副作用emptyspots。如果缓冲区已满,则DEPOSIT必须让调用者在队列中等待emptyspots空位可用。当DEPOSIT完成后,DEPOSIT子程序会增加信号量的计数器fullspots以指示缓冲区中还有一个已填充的位置。
Our example buffer is designed as an abstract data type in which all data enters the buffer through the subprogram DEPOSIT, and all data leaves the buffer through the subprogram FETCH. The DEPOSIT subprogram needs only to check with the emptyspots semaphore to see whether there are any empty positions. If there is at least one, it can proceed with the DEPOSIT, which must have the side effect of decrementing the counter of emptyspots. If the buffer is full, the caller to DEPOSIT must be made to wait in the emptyspots queue for an empty spot to become available. When the DEPOSIT is complete, the DEPOSIT subprogram increments the counter of the fullspots semaphore to indicate that there is one more filled location in the buffer.
子程序FETCH具有相反的序列DEPOSIT。它检查fullspots信号量以查看缓冲区是否至少包含一个项目。如果包含,则删除一个项目并将emptyspots信号量的计数器加 1。如果缓冲区为空,则将调用任务放入fullspots队列中等待,直到出现项目。FETCH完成后,它必须增加计数器emptyspots。
The FETCH subprogram has the opposite sequence of DEPOSIT. It checks the fullspots semaphore to see whether the buffer contains at least one item. If it does, an item is removed and the emptyspots semaphore has its counter incremented by 1. If the buffer is empty, the calling task is put in the fullspots queue to wait until an item appears. When FETCH is finished, it must increment the counter of emptyspots.
信号量类型的操作通常不是直接的——它们是通过wait和release子程序完成的。因此,DEPOSIT刚才描述的操作实际上部分是通过调用wait和来完成的release。请注意,wait和release必须能够访问任务就绪队列。
The operations on semaphore types often are not direct—they are done through wait and release subprograms. Therefore, the DEPOSIT operation just described is actually accomplished in part by calls to wait and release. Note that wait and release must be able to access the task-ready queue.
信号量子程序wait用于测试给定信号量变量的计数器。如果该值大于零,则调用者可以执行其操作。在这种情况下,信号量变量的计数器值会减少,以指示现在计数的值少了一个。如果计数器的值为零,则必须将调用者放在信号量变量的等待队列中,并且必须将处理器交给其他就绪任务。
The wait semaphore subprogram is used to test the counter of a given semaphore variable. If the value is greater than zero, the caller can carry out its operation. In this case, the counter value of the semaphore variable is decremented to indicate that there is now one fewer of whatever it counts. If the value of the counter is zero, the caller must be placed on the waiting queue of the semaphore variable, and the processor must be given to some other ready task.
信号量子程序release由任务使用,以允许其他任务拥有指定信号量变量的计数器计数的其中一个。如果指定信号量变量的队列为空,这意味着没有任务在等待,则release增加其计数器(以指示现在有另一个正在控制的可用任务)。如果一个或多个任务正在等待,release则将其中一个从信号量队列移至就绪队列。
The release semaphore subprogram is used by a task to allow some other task to have one of whatever the counter of the specified semaphore variable counts. If the queue of the specified semaphore variable is empty, which means no task is waiting, release increments its counter (to indicate there is one more of whatever is being controlled that is now available). If one or more tasks are waiting, release moves one of them from the semaphore queue to the ready queue.
以下是wait和的简明伪代码描述release:
The following are concise pseudocode descriptions of wait and release:
wait(aSemaphore)
if aSemaphore的
计数器 减>0 将调用者放入 的队列
中 尝试将控制权转移到某个就绪任务
(如果任务就绪队列为空,则会发生死锁)
then
aSemaphoreelse
aSemaphoreend if
wait(aSemaphore)
if aSemaphore’s counter > 0 then
decrement aSemaphore’s counter
else
put the caller in aSemaphore’s queue
attempt to transfer control to some ready task
(if the task-ready queue is empty, deadlock occurs)
end if
release(aSemaphore)
if aSemaphore的队列为空,(没有任务在等待) then
,增加aSemaphore的计数器,
else
将调用任务放入任务就绪队列,将
控制权转移给的队列
中的任务aSemaphoreend
release(aSemaphore)
if aSemaphore’s queue is empty (no task is waiting) then
increment aSemaphore’s counter
else
put the calling task in the task-ready queue
transfer control to a task from aSemaphore’s queue
end
现在我们可以给出一个示例程序,该程序实现了共享缓冲区的协作同步。在这种情况下,共享缓冲区存储整数值,并且是逻辑上的循环结构。它设计用于可能由多个生产者和消费者任务使用。
We can now present an example program that implements cooperation synchronization for a shared buffer. In this case, the shared buffer stores integer values and is a logically circular structure. It is designed for use by possibly multiple producer and consumer tasks.
以下伪代码显示了生产者和消费者任务的定义。两个信号量用于确保缓冲区不会下溢或溢出,从而提供合作同步。假设缓冲区的长度为BUFLEN,并且实际操作它的例程已经存在为FETCH和DEPOSIT。对信号量计数器的访问由点表示法指定。例如,如果fullspots是信号量,则其计数器由引用fullspots.count。
The following pseudocode shows the definition of the producer and consumer tasks. Two semaphores are used to ensure against buffer underflow or overflow, thus providing cooperation synchronization. Assume that the buffer has length BUFLEN, and the routines that actually manipulate it already exist as FETCH and DEPOSIT. Accesses to the counter of a semaphore are specified by dot notation. For example, if fullspots is a semaphore, its counter is referenced by fullspots.count.
semaphore fullspots, emptyspots;
fullspots.count = 0;
emptyspots.count = BUFLEN;
task producer;
loop
-- produce VALUE --
wait(emptyspots); { wait for a space }
DEPOSIT(VALUE);
release(fullspots); { increase filled spaces }
end loop;
end producer;
task consumer;
loop
wait(fullspots); { make sure it is not empty }
FETCH(VALUE);
release(emptyspots); { increase empty spaces }
-- consume VALUE --
end loop
end consumer;
semaphore fullspots, emptyspots;
fullspots.count = 0;
emptyspots.count = BUFLEN;
task producer;
loop
-- produce VALUE --
wait(emptyspots); { wait for a space }
DEPOSIT(VALUE);
release(fullspots); { increase filled spaces }
end loop;
end producer;
task consumer;
loop
wait(fullspots); { make sure it is not empty }
FETCH(VALUE);
release(emptyspots); { increase empty spaces }
-- consume VALUE --
end loop
end consumer;
如果缓冲区当前为空,信号量fullspots将使consumer任务排队等待缓冲区条目。如果缓冲区当前已满,信号量emptyspots将使producer任务排队等待缓冲区中的空闲空间。
The semaphore fullspots causes the consumer task to be queued to wait for a buffer entry if it is currently empty. The semaphore emptyspots causes the producer task to be queued to wait for an empty space in the buffer if it is currently full.
我们的缓冲区示例不提供竞争同步。可以使用附加信号量来控制对结构的访问。此信号量不需要计算任何内容,而只需用其计数器指示缓冲区当前是否正在使用。wait仅当信号量的计数器值为时,该语句才允许访问1,这表明共享缓冲区当前未被访问。如果信号量的计数器值为0,则表示当前正在进行访问,并且任务将放置在信号量的队列中。请注意,信号量的计数器必须初始化为1。在开始使用队列之前,必须始终将信号量的队列初始化为空。
Our buffer example does not provide competition synchronization. Access to the structure can be controlled with an additional semaphore. This semaphore need not count anything but can simply indicate with its counter whether the buffer is currently being used. The wait statement allows the access only if the semaphore’s counter has the value 1, which indicates that the shared buffer is not currently being accessed. If the semaphore’s counter has a value of 0, there is a current access taking place, and the task is placed in the queue of the semaphore. Notice that the semaphore’s counter must be initialized to 1. The queues of semaphores must always be initialized to empty before use of the queue can begin.
只需要二进制值计数器的信号量(如以下示例中用于提供竞争同步的信号量)称为二进制信号量。
A semaphore that requires only a binary-valued counter, like the one used to provide competition synchronization in the following example, is called a binary semaphore.
以下示例伪代码说明了如何使用信号量为并发访问的共享缓冲区提供竞争和合作同步。access信号量用于确保对缓冲区的互斥访问。请记住,可能存在多个生产者和多个消费者。
The example pseudocode that follows illustrates the use of semaphores to provide both competition and cooperation synchronization for a concurrently accessed shared buffer. The access semaphore is used to ensure mutually exclusive access to the buffer. Remember that there may be more than one producer and more than one consumer.
semaphore access, fullspots, emptyspots;
access.count = 1;
fullspots.count = 0;
emptyspots.count = BUFLEN;
task producer;
loop
-- produce VALUE --
wait(emptyspots); { wait for a space }
wait(access); { wait for access }
DEPOSIT(VALUE);
release(access); { relinquish access }
release(fullspots); { increase filled spaces }
end loop;
end producer;
task consumer;
loop
wait(fullspots); { make sure it is not empty }
wait(access); { wait for access }
FETCH(VALUE);
release(access); { relinquish access }
release(emptyspots); { increase empty spaces }
-- consume VALUE --
end loop
end consumer;
semaphore access, fullspots, emptyspots;
access.count = 1;
fullspots.count = 0;
emptyspots.count = BUFLEN;
task producer;
loop
-- produce VALUE --
wait(emptyspots); { wait for a space }
wait(access); { wait for access }
DEPOSIT(VALUE);
release(access); { relinquish access }
release(fullspots); { increase filled spaces }
end loop;
end producer;
task consumer;
loop
wait(fullspots); { make sure it is not empty }
wait(access); { wait for access }
FETCH(VALUE);
release(access); { relinquish access }
release(emptyspots); { increase empty spaces }
-- consume VALUE --
end loop
end consumer;
简单看一下这个例子,你可能会觉得有问题。具体来说,假设当一个任务在 中等待调用 时wait(access),consumer另一个任务从共享缓冲区中获取最后一个值。幸运的是,这种情况不会发生,因为wait(fullspots)通过递减计数器,在缓冲区中为调用它的任务保留了一个值fullspots。
A brief look at this example may lead one to believe that there is a problem. Specifically, suppose that while a task is waiting at the wait(access) call in consumer, another task takes the last value from the shared buffer. Fortunately, this cannot happen, because the wait(fullspots) reserves a value in the buffer for the task that calls it by decrementing the fullspots counter.
到目前为止,信号量还有一个关键方面尚未讨论。回想一下前面对竞争同步问题的描述:对共享数据的操作不得重叠。如果第二个操作在前一个操作仍在进行时开始,则共享数据可能会损坏。信号量本身就是一个共享数据对象,因此对信号量的操作也容易受到相同问题的影响。因此,信号量操作必须不可中断。许多计算机都有专门为信号量操作设计的不可中断指令。如果没有这样的指令,那么使用信号量提供竞争同步就是一个严重的问题,没有简单的解决方案。
There is one crucial aspect of semaphores that thus far has not been discussed. Recall the earlier description of the problem of competition synchronization: Operations on shared data must not overlap. If a second operation begins while an earlier operation is still in progress, the shared data can become corrupted. A semaphore is itself a shared data object, so the operations on semaphores are also susceptible to the same problem. It is therefore essential that semaphore operations be uninterruptible. Many computers have uninterruptible instructions that were designed specifically for semaphore operations. If such instructions are not available, then using semaphores to provide competition synchronization is a serious problem with no simple solution.
使用信号量提供合作同步会创建一个不安全的编程环境。没有办法静态地检查它们的使用是否正确,这取决于它们出现的程序的语义。在缓冲区示例中,省略wait(emptyspots)任务语句producer将导致缓冲区溢出。省略wait(fullspots)任务语句consumer将导致缓冲区下溢。省略任何一种释放都会导致死锁。这些都是合作同步失败。
Using semaphores to provide cooperation synchronization creates an unsafe programming environment. There is no way to check statically for the correctness of their use, which depends on the semantics of the program in which they appear. In the buffer example, leaving out the wait(emptyspots) statement of the producer task would result in buffer overflow. Leaving out the wait(fullspots) statement of the consumer task would result in buffer underflow. Leaving out either of the releases would result in deadlock. These are cooperation synchronization failures.
信号量在提供合作同步时引起的可靠性问题,在将其用于竞争同步时也会出现。wait(access)在任一任务中遗漏语句都会导致对缓冲区的不安全访问。在任一任务中遗漏语句都会导致死锁。这些都是竞争同步失败。Per Brinch Hansen (1973)release(access)指出了使用信号量的危险,他写道:“对于从不犯错的理想程序员来说,信号量是一种优雅的同步工具。”不幸的是,理想的程序员很少见。
The reliability problems that semaphores cause in providing cooperation synchronization also arise when using them for competition synchronization. Leaving out the wait(access) statement in either task can cause insecure access to the buffer. Leaving out the release(access) statement in either task results in deadlock. These are competition synchronization failures. Noting the danger in using semaphores, Per Brinch Hansen (1973) wrote, “The semaphore is an elegant synchronization tool for an ideal programmer who never makes mistakes.” Unfortunately, ideal programmers are rare.
解决并发环境中信号量的一些问题的一种方法是将共享数据结构与其操作封装在一起并隐藏其表示,即使共享数据结构抽象数据类型并施加一些特殊限制。此解决方案可以通过将同步责任转移到运行时系统来提供竞争同步,而无需信号量。
One solution to some of the problems of semaphores in a concurrent environment is to encapsulate shared data structures with their operations and hide their representations—that is, to make shared data structures abstract data types with some special restrictions. This solution can provide competition synchronization without semaphores by transferring responsibility for synchronization to the run-time system.
在制定数据抽象概念时,参与这项工作的人们将相同的概念应用于并发编程环境中的共享数据,以生成监视器。根据 Per Brinch Hansen(Brinch Hansen,1977,第 xvi 页)的说法,Edsger Dijkstra 于 1971 年建议将共享数据上的所有同步操作集中到单个程序单元中。Brinch Hansen(1973 年)在操作系统环境中正式化了这一概念。次年,Hoare(1974 年)将这些结构命名为监视器。
When the concepts of data abstraction were being formulated, the people involved in that effort applied the same concepts to shared data in concurrent programming environments to produce monitors. According to Per Brinch Hansen (Brinch Hansen, 1977, p. xvi), Edsger Dijkstra suggested in 1971 that all synchronization operations on shared data be gathered into a single program unit. Brinch Hansen (1973) formalized this concept in the environment of operating systems. The following year, Hoare (1974) named these structures monitors.
第一个集成监视器的编程语言是 Concurrent Pascal(Brinch Hansen,1975 年)。Modula(Wirth,1977 年)、CSP/k(Holt 等人,1978 年)和 Mesa(Mitchell 等人,1979 年)也提供监视器。在当代语言中,支持监视器的语言有 Ada、Java 和 C#,本章后面将讨论这些语言。
The first programming language to incorporate monitors was Concurrent Pascal (Brinch Hansen, 1975). Modula (Wirth, 1977), CSP/k (Holt et al., 1978), and Mesa (Mitchell et al., 1979) also provide monitors. Among contemporary languages, monitors are supported by Ada, Java, and C#, all of which are discussed later in this chapter.
监视器最重要的特性之一是共享数据驻留在监视器中,而不是任何客户端单元中。程序员不会通过使用信号量或其他机制来同步对共享数据的互斥访问。由于访问机制是监视器的一部分,因此可以实现监视器以保证同步访问,方法是一次只允许一次访问。如果监视器在调用时正忙,则对监视器过程的调用将被隐式阻止并存储在队列中。
One of the most important features of monitors is that shared data is resident in the monitor rather than in any of the client units. The programmer does not synchronize mutually exclusive access to shared data through the use of semaphores or other mechanisms. Because the access mechanisms are part of the monitor, implementation of a monitor can be made to guarantee synchronized access by allowing only one access at a time. Calls to monitor procedures are implicitly blocked and stored in a queue if the monitor is busy at the time of the call.
尽管对共享数据的互斥访问是监视器固有的,但进程之间的协作仍然是程序员的任务。特别是,程序员必须保证共享缓冲区不会出现下溢或上溢。不同的语言提供了不同的编程协作同步方式,所有这些都与信号量有关。
Although mutually exclusive access to shared data is intrinsic with a monitor, cooperation between processes is still the task of the programmer. In particular, the programmer must guarantee that a shared buffer does not experience underflow or overflow. Different languages provide different ways of programming cooperation synchronization, all of which are related to semaphores.
图 13.3描述了一个包含四个任务和一个提供对并发共享缓冲区的同步访问的监视器的程序。在该图中,监视器的接口显示为两个标有insert和的框remove(用于插入和删除数据)。监视器看起来就像一个抽象数据类型(一种访问权限有限的数据结构),这就是监视器的本质。
A figure depicting a program containing four tasks and a monitor that provides synchronized access to a concurrently shared buffer is shown in Figure 13.3. In this figure, the interface to the monitor is shown as the two boxes labeled insert and remove (for the insertion and removal of data). The monitor appears exactly like an abstract data type—a data structure with limited access—which is what a monitor is.
监视器比信号量更能提供竞争同步,这主要是因为信号量存在问题,如第 13.3节 所述。协作同步仍然是监视器的一个问题,在以下章节讨论监视器的 Ada 和 Java 实现时将会清楚这一点。
Monitors are a better way to provide competition synchronization than are semaphores, primarily because of the problems of semaphores, as discussed in Section 13.3. The cooperation synchronization is still a problem with monitors, as will be clear when Ada and Java implementations of monitors are discussed in the following sections.
信号量和监视器在表达并发控制方面同样强大 - 信号量可用于实现监视器,监视器可用于实现信号量。
Semaphores and monitors are equally powerful at expressing concurrency control—semaphores can be used to implement monitors and monitors can be used to implement semaphores.
Ada 提供了两种实现监视器的方法。Ada 83 包含一个通用任务模型,可用于支持监视器。Ada 95 添加了一种更简洁、更高效的监视器构造方法,称为受保护对象。这两种方法都使用消息传递作为支持并发的基本模型。消息传递模型允许并发单元分布式,而监视器不允许。消息传递在第 13.5节 中描述;Ada 对消息传递的支持在第 13.6节 中讨论。
Ada provides two ways to implement monitors. Ada 83 includes a general tasking model that can be used to support monitors. Ada 95 added a cleaner and more efficient way of constructing monitors, called protected objects. Both of these approaches use message passing as a basic model for supporting concurrency. The message-passing model allows concurrent units to be distributed, which monitors do not allow. Message passing is described in Section 13.5; Ada support for message passing is discussed in Section 13.6.
在 Java 中,监视器可以在设计为抽象数据类型的类中实现,共享数据是其类型。通过在访问方法中添加修饰符来控制对该类对象的访问。第13.7.4节synchronized给出了用 Java 编写的共享缓冲区监视器的示例 。
In Java, a monitor can be implemented in a class designed as an abstract data type, with the shared data being the type. Accesses to objects of the class are controlled by adding the synchronized modifier to the access methods. An example of a monitor for the shared buffer written in Java is given in Section 13.7.4.
C# 有一个预定义类Monitor,专门用于实现监视器。
C# has a predefined class, Monitor, which is designed for implementing monitors.
本节介绍并发中消息传递的基本概念。请注意,此消息传递概念与面向对象编程中用于制定方法的消息传递无关。
This section introduces the fundamental concept of message passing in concurrency. Note that this concept of message passing is unrelated to the message passing used in object-oriented programming to enact methods.
最早致力于设计能够在并发任务之间传递消息的语言的是Brinch Hansen (1978)和Hoare (1978)。这些消息传递的先驱开发者还开发了一种技术来处理当其他任务同时发出多个与给定任务通信的请求时该做什么的问题。他们决定需要某种形式的不确定性来公平地选择首先处理这些请求中的哪一个。这种公平性可以用各种方式来定义,但一般来说,它意味着所有请求者都有平等的机会与给定任务进行通信(假设每个请求者具有相同的优先级)。Dijkstra (1975)引入了用于语句级控制的非确定性构造,称为受保护命令。第8章将讨论受保护命令。受保护命令是控制消息传递构造的基础。
The first efforts to design languages that provide the capability for message passing among concurrent tasks were those of Brinch Hansen (1978) and Hoare (1978). These pioneer developers of message passing also developed a technique for handling the problem of what to do when multiple simultaneous requests were made by other tasks to communicate with a given task. It was decided that some form of nondeterminism was required to provide fairness in choosing which among those requests would be taken first. This fairness can be defined in various ways, but in general, it means that all requesters are provided an equal chance of communicating with a given task (assuming that every requester has the same priority). Nondeterministic constructs for statement-level control, called guarded commands, were introduced by Dijkstra (1975). Guarded commands are discussed in Chapter 8. Guarded commands are the basis of the construct designed for controlling message passing.
消息传递可以是同步的,也可以是异步的。这里,我们描述同步消息传递。同步消息传递的基本概念是任务通常很忙,当忙时,它们不能被其他单元中断。假设任务A和任务B都在执行中,并A希望向发送消息B。显然,如果B忙,则不希望让另一个任务中断它。这会破坏B当前的处理。此外,消息通常会导致接收方进行相关处理,如果其他处理不完整,这可能不明智。另一种方法是提供一种语言机制,允许任务向其他任务指定何时准备好接收消息。这种方法有点像一位主管指示他的秘书保留所有来电,直到另一项活动(可能是一次重要的对话)完成。稍后,当当前对话结束时,主管告诉秘书他或她现在愿意与被搁置的呼叫者之一通话。
Message passing can be either synchronous or asynchronous. Here, we describe synchronous message passing. The basic concept of synchronous message passing is that tasks are often busy, and when busy, they cannot be interrupted by other units. Suppose task A and task B are both in execution, and A wishes to send a message to B. Clearly, if B is busy, it is not desirable to allow another task to interrupt it. That would disrupt B’s current processing. Furthermore, messages usually cause associated processing in the receiver, which might not be sensible if other processing is incomplete. The alternative is to provide a linguistic mechanism that allows a task to specify to other tasks when it is ready to receive messages. This approach is somewhat like an executive who instructs his or her secretary to hold all incoming calls until another activity, perhaps an important conversation, is completed. Later, when the current conversation is complete, the executive tells the secretary that he or she is now willing to talk to one of the callers who has been placed on hold.
可以将任务设计为在某个时刻暂停执行,要么是因为它处于空闲状态,要么是因为它需要来自另一个单元的信息才能继续。这就像一个人在等待一个重要电话。在某些情况下,除了坐着等待之外没有别的事可做。但是,如果任务在发送A消息时正在等待B该消息,则可以传输该消息。消息的这种实际传输称为会合。请注意,只有发送方和接收方都希望会合时,才会发生会合。在会合期间,消息的信息可以单向或双向传输。
A task can be designed so that it can suspend its execution at some point, either because it is idle or because it needs information from another unit before it can continue. This is like a person who is waiting for an important call. In some cases, there is nothing else to do but sit and wait. However, if task A is waiting for a message at the time task B sends that message, the message can be transmitted. This actual transmission of the message is called a rendezvous. Note that a rendezvous can occur only if both the sender and receiver want it to happen. During a rendezvous, the information of the message can be transmitted in either or both directions.
任务的合作和竞争同步都可以通过消息传递模型方便地处理,如下一节所述。
Both cooperation and competition synchronization of tasks can be conveniently handled with the message-passing model, as described in the following section.
本节介绍 Ada 提供的并发支持。Ada 83 仅支持同步消息传递。
This section describes the support for concurrency provided by Ada. Ada 83 supports only synchronous message passing.
Ada 的任务设计部分基于 Brinch Hansen 和 Hoare 的工作,其中消息传递是设计基础,并且使用不确定性在发送消息的任务中进行选择。
The Ada design for tasks is partially based on the work of Brinch Hansen and Hoare in that message passing is the design basis and nondeterminism is used to choose among the tasks that have sent messages.
完整的 Ada 任务模型非常复杂,因此下文的讨论有限。本文将重点介绍 Ada 版本的同步消息传递机制。
The full Ada tasking model is complex, and the following discussion of it is limited. The focus here will be on the Ada version of the synchronous message-passing mechanism.
Ada 任务可以比监视器更主动。监视器是被动实体,它们为存储的共享数据提供管理服务。它们仅在请求这些服务时才提供服务。当用于管理共享数据时,Ada 任务可以被视为可以驻留在其管理的资源中的管理器。它们有几种机制,一些是确定性的,一些是非确定性的,允许它们在竞争的资源访问请求中进行选择。
Ada tasks can be more active than monitors. Monitors are passive entities that provide management services for the shared data they store. They provide their services, though only when those services are requested. When used to manage shared data, Ada tasks can be thought of as managers that can reside with the resource they manage. They have several mechanisms, some deterministic and some nondeterministic, that allow them to choose among competing requests for access to their resources.
Ada 任务有两个语法部分——规范部分和主体部分——两者具有相同的名称。任务的接口是其入口点或可以接受来自其他任务的消息的位置。由于这些入口点是其接口的一部分,因此它们自然会列在任务的规范部分中。由于集合点可能涉及信息交换,因此消息可以包含参数;因此,任务入口点也必须允许参数,这也必须在规范部分中描述。从外观上看,任务规范类似于抽象数据类型的包规范。
There are two syntactic parts to an Ada task—a specification part and a body part—both with the same name. The interface of a task is its entry points or locations where it can accept messages from other tasks. Because these entry points are part of its interface, it is natural that they be listed in the specification part of a task. Because a rendezvous can involve an exchange of information, messages can have parameters; therefore, task entry points must also allow parameters, which must also be described in the specification part. In appearance, a task specification is similar to the package specification for an abstract data type.
作为 Ada 任务规范的一个示例,请考虑以下代码,其中包括一个名为的入口点Entry_1,该入口点具有一个 in-mode 参数:
As an example of an Ada task specification, consider the following code, which includes a single entry point named Entry_1, which has an in-mode parameter:
task Task_Example is
entry Entry_1(Item : in Integer);
end Task_Example;
task Task_Example is
entry Entry_1(Item : in Integer);
end Task_Example;
entry任务主体必须包含与该任务规范部分中的子句相对应的入口点的某种语法形式。在 Ada 中,这些任务主体入口点由accept保留字引入的子句指定。accept 子句accept定义为以保留字开头并以匹配的end保留字结尾的语句范围。accept子句本身相对简单,但可以嵌入它们的其他构造可能会使其语义变得复杂。简单accept子句具有以下形式:
A task body must include some syntactic form of the entry points that correspond to the entry clauses in that task’s specification part. In Ada, these task body entry points are specified by clauses that are introduced by the accept reserved word. An accept clause is defined as the range of statements beginning with the accept reserved word and ending with the matching end reserved word. accept clauses are themselves relatively simple, but other constructs in which they can be embedded can make their semantics complex. A simple accept clause has the following form:
accept entry_name (formal parameters) do
...
end entry_name;
accept entry_name (formal parameters) do
...
end entry_name;
条目accept名称与相关任务规范部分中的子句中的名称相匹配entry。可选参数提供在调用方和被调用任务之间传递数据的方式。do和之间的语句end定义在会合期间发生的操作。这些语句合称为accept 子句主体。在实际会合期间,发送方任务将被暂停。
The accept entry name matches the name in an entry clause in the associated task specification part. The optional parameters provide the means of communicating data between the caller and the called task. The statements between the do and the end define the operations that take place during the rendezvous. These statements are together called the accept clause body. During the actual rendezvous, the sender task is suspended.
无论出于何种原因,只要某个accept子句收到一条它不愿意接受的消息,就必须暂停发送者任务,直到accept接收者任务中的子句准备好接受该消息。当然,该accept子句还必须记住发送了未被接受的消息的发送者任务。为此,accept任务中的每个子句都有一个与之关联的队列,该队列存储了尝试与其通信但未成功的其他任务的列表。
Whenever an accept clause receives a message that it is not willing to accept, for whatever reason, the sender task must be suspended until the accept clause in the receiver task is ready to accept the message. Of course, the accept clause must also remember the sender tasks that have sent messages that were not accepted. For this purpose, each accept clause in a task has a queue associated with it that stores a list of other tasks that have unsuccessfully attempted to communicate with it.
以下是先前给出规范的任务的骨架:
The following is the skeletal body of the task whose specification was given previously:
task body Task_Example is
begin
loop
accept Entry_1(Item : in Integer) do
...
end Entry_1;
end loop;
end Task_Example;
task body Task_Example is
begin
loop
accept Entry_1(Item : in Integer) do
...
end Entry_1;
end loop;
end Task_Example;
此任务主体的子句是任务规范中命名accept的实现。如果在任何其他任务向 发送消息之前的执行开始并到达子句,则暂停。如果另一个任务在暂停在其时向 发送消息,则发生会合并执行子句主体。然后,由于循环,执行继续回到。如果没有其他任务向 发送消息,则再次暂停执行以等待下一条消息。entryEntry_1Task_ExampleEntry_1 acceptEntry_1Task_ExampleEntry_1Task_ExampleacceptacceptacceptEntry_1
The accept clause of this task body is the implementation of the entry named Entry_1 in the task specification. If the execution of Task_Example begins and reaches the Entry_1 accept clause before any other task sends a message to Entry_1, Task_Example is suspended. If another task sends a message to Entry_1 while Task_Example is suspended at its accept, a rendezvous occurs and the accept clause body is executed. Then, because of the loop, execution proceeds back to the accept. If no other task has sent a message to Entry_1, execution is again suspended to wait for the next message.
在这个简单的例子中,会合可以以两种基本方式发生。首先,接收方任务Task_Example可能正在等待另一个任务向Entry_1条目发送消息。当消息发送后,就会发生会合。这是前面描述的情况。其次,当另一个任务试图向同一条目发送消息时,接收方任务可能正忙于一个会合,或忙于与会合无关的其他处理。在这种情况下,发送方将被暂停,直到接收方有空在会合中接受该消息。如果在接收方繁忙时收到多条消息,则发送方将排队等待轮到它们进行会合。
A rendezvous can occur in two basic ways in this simple example. First, the receiver task, Task_Example, can be waiting for another task to send a message to the Entry_1 entry. When the message is sent, the rendezvous occurs. This is the situation described earlier. Second, the receiver task can be busy with one rendezvous, or with some other processing not associated with a rendezvous, when another task attempts to send a message to the same entry. In that case, the sender is suspended until the receiver is free to accept that message in a rendezvous. If several messages arrive while the receiver is busy, the senders are queued to wait their turn for a rendezvous.
图 13.4中的时间线图说明了刚刚描述的两个会合。
The two rendezvous just described are illustrated with the timeline diagrams in Figure 13.4.
Task_Example方式Task_Example can occur任务不需要有入口点。这样的任务被称为参与者任务,因为它们不需要等待会合点即可完成工作。参与者任务可以通过向其他任务发送消息来与其他任务会合。与参与者任务不同的是,任务可以有accept子句,但不能有子句之外的任何代码accept,因此它只能对其他任务做出反应。这样的任务被称为服务器任务。
Tasks need not have entry points. Such tasks are called actor tasks because they do not wait for a rendezvous in order to do their work. Actor tasks can rendezvous with other tasks by sending them messages. In contrast to actor tasks, a task can have accept clauses but not have any code outside those accept clauses, so it can only react to other tasks. Such a task is called a server task.
向另一个任务发送消息的 Ada 任务必须知道该任务中的条目名称。但是,反之则不然:任务条目不需要知道它将接受来自哪个任务的消息。这种不对称与 CSP(即通信顺序进程)语言的设计形成鲜明对比(Hoare,1978 年)。在 CSP 中,任务也使用并发的消息传递模型,任务只接受来自明确命名的任务的消息。这样做的缺点是无法构建任务库以供一般使用。
An Ada task that sends a message to another task must know the entry name in that task. However, the opposite is not true: A task entry need not know the name of the task from which it will accept messages. This asymmetry is in contrast to the design of the language known as CSP, or Communicating Sequential Processes (Hoare, 1978). In CSP, which also uses the message-passing model of concurrency, tasks accept messages only from explicitly named tasks. The disadvantage of this is that libraries of tasks cannot be built for general use.
图13.5显示了描述任务A向任务发送消息的集合点的通常图形方法。B
The usual graphical method of describing a rendezvous in which task A sends a message to task B is shown in Figure 13.5.
A任务间消息发送引起的会合的图形表示BA to task B任务在包、子程序或块的声明部分中声明。静态创建的任务2与该声明部分所附加到的代码中的语句同时开始执行。例如,在主程序中声明的任务与主程序代码主体中的第一个语句同时开始执行。任务终止是一个复杂的问题,将在本节后面讨论。
Tasks are declared in the declaration part of a package, subprogram, or block. Statically created tasks2 begin executing at the same time as the statements in the code to which that declarative part is attached. For example, a task declared in a main program begins execution at the same time as the first statement in the code body of the main program. Task termination, which is a complex issue, is discussed later in this section.
任务可以有任意数量的条目。相关子句在任务中出现的顺序accept决定了可以接受消息的顺序。如果任务有多个入口点,并要求它们能够以任何顺序接收消息,则任务使用语句select来封装条目。例如,假设一个任务模拟银行出纳员的活动,他必须在银行内的步入式服务站为客户提供服务,同时还为免下车取款窗口的顾客。以下是出纳员的基本任务,它说明了这一select概念:
Tasks may have any number of entries. The order in which the associated accept clauses appear in the task dictates the order in which messages can be accepted. If a task has more than one entry point and requires them to be able to receive messages in any order, the task uses a select statement to enclose the entries. For example, suppose a task models the activities of a bank teller, who must serve customers at a walk-up station inside the bank and also serve customers at a drive-up window. The following skeletal teller task illustrates a select construct:
task body Teller is
begin
loop
select
accept Drive_Up(formal parameters) do
...
end Drive_Up;
...
or
accept Walk_Up(formal parameters) do
...
end Walk_Up;
...
end select;
end loop;
end Teller;
task body Teller is
begin
loop
select
accept Drive_Up(formal parameters) do
...
end Drive_Up;
...
or
accept Walk_Up(formal parameters) do
...
end Walk_Up;
...
end select;
end loop;
end Teller;
在此任务中,有两个accept子句Walk_Up和,每个子句都有一个关联队列。执行时Drive_Up的操作是检查与两个子句关联的队列。如果其中一个队列为空,但另一个队列包含至少一条等待消息(客户),则与等待消息关联的子句将与发送收到的第一条消息的任务会合。如果两个子句的队列都是空的,则将等待,直到调用其中一个条目。如果两个子句的队列都不为空,则将不确定地选择其中一个子句与其调用者之一会合。循环强制语句重复执行,直到永远。selectacceptacceptacceptselectacceptacceptselect
In this task, there are two accept clauses, Walk_Up and Drive_Up, each of which has an associated queue. The action of the select, when it is executed, is to examine the queues associated with the two accept clauses. If one of the queues is empty, but the other contains at least one waiting message (customer), the accept clause associated with the waiting message or messages has a rendezvous with the task that sent the first message that was received. If both accept clauses have empty queues, the select waits until one of the entries is called. If both accept clauses have nonempty queues, one of the accept clauses is nondeterministically chosen to have a rendezvous with one of its callers. The loop forces the select statement to be executed repeatedly, forever.
end子句的标记accept分配或引用子句形式参数的代码的结束。 一个子句和下一个子句(或,如果该子句是 中的最后一个子句)accept之间的代码(如果有)称为扩展子句。 扩展子句仅在关联(紧接在前)子句执行完成后才执行。 扩展子句的执行不是会合的一部分,可以与调用任务的执行同时进行。 发送方在会合期间被暂停,但当到达子句的末尾时,它将被放回就绪队列。 如果子句没有形式参数,则不需要 ,并且accept子句可以完全由扩展子句组成。 这样的子句将专门用于同步。 扩展子句在第13.6.3节的任务中进行了说明。acceptorend selectacceptselect accept acceptacceptacceptacceptacceptdo-endacceptacceptacceptBuf_Task
The end of the accept clause marks the end of the code that assigns or references the formal parameters of the accept clause. The code, if there is any, between an accept clause and the next or (or the end select, if the accept clause is the last one in the select) is called the extended accept clause. The extended accept clause is executed only after the associated (immediately preceding) accept clause is executed. This execution of the extended accept clause is not part of the rendezvous and can take place concurrently with the execution of the calling task. The sender is suspended during the rendezvous, but it is put back in the ready queue when the end of the accept clause is reached. If an accept clause has no formal parameters, the do-end is not required, and the accept clause can consist entirely of an extended accept clause. Such an accept clause would be used exclusively for synchronization. Extended accept clauses are illustrated in the Buf_Task task in Section 13.6.3.
每个accept子句都可以附加一个保护,以子句的形式when,可以延迟会合。例如,
Each accept clause can have a guard attached, in the form of a when clause, that can delay rendezvous. For example,
when not Full(Buffer) =>
accept Deposit(New_Value) do
...
end
when not Full(Buffer) =>
accept Deposit(New_Value) do
...
end
accept带有子句的子句要么when是开放的,要么是封闭的。如果子句的布尔表达式when当前为真,则该accept子句称为开放的;如果布尔表达式为假,则该accept子句称为封闭的。accept没有保护的子句始终是开放的。开放子句accept可用于会合;封闭accept子句不能会合。
An accept clause with a when clause is either open or closed. If the Boolean expression of the when clause is currently true, that accept clause is called open; if the Boolean expression is false, the accept clause is called closed. An accept clause that does not have a guard is always open. An open accept clause is available for rendezvous; a closed accept clause cannot rendezvous.
假设一个子句中有多个受保护accept子句select。这样的select子句通常放在无限循环中。循环导致select子句被重复执行,每次when重复时都会评估每个子句。每次重复都会accept构造一个开放子句列表。如果其中一个开放子句具有非空队列,则从该队列中获取消息并进行会合。如果多个开放子句具有非空队列,则accept不确定地选择一个队列,从该队列中获取消息并进行会合。如果所有开放子句的队列都为空,则任务将等待消息到达其中一个子句,accept此时将发生会合。如果select执行并且每个子句都已关闭,则会导致运行时异常或错误。可以通过确保其中一个子句始终为真或在中添加子句accept来避免这种可能性。子句可以包含除子句之外的任何语句序列。whenelseselectelseaccept
Suppose there are several guarded accept clauses in a select clause. Such a select clause is usually placed in an infinite loop. The loop causes the select clause to be executed repeatedly, with each when clause evaluated on each repetition. Each repetition causes a list of open accept clauses to be constructed. If exactly one of the open clauses has a nonempty queue, a message from that queue is taken and a rendezvous takes place. If more than one of the open accept clauses has nonempty queues, one queue is chosen nondeterministically, a message is taken from that queue, and a rendezvous takes place. If the queues of all open clauses are empty, the task waits for a message to arrive at one of those accept clauses, at which time a rendezvous will occur. If a select is executed and every accept clause is closed, a run-time exception or error results. This possibility can be avoided either by making sure one of the when clauses is always true or by adding an else clause in the select. An else clause can include any sequence of statements, except an accept clause.
子句select可能有一个特殊语句,terminate,只有当该语句打开且没有其他accept子句打开时才被选中。terminate子句被选中时,表示任务已完成其工作但尚未终止。 任务终止将在本节后面讨论。
A select clause may have a special statement, terminate, that is selected only when it is open and no other accept clause is open. A terminate clause, when selected, means that the task is finished with its job but is not yet terminated. Task termination is discussed later in this section.
到目前为止描述的功能提供了任务之间的合作、同步和通信。接下来,我们将讨论如何在 Ada 中强制执行对共享数据结构的互斥访问。
The features described so far provide for cooperation synchronization and communication among tasks. Next, we discuss how mutually exclusive access to shared data structures can be enforced in Ada.
如果要由任务控制对数据结构的访问,则可以通过在任务中声明数据结构来实现互斥访问。任务执行的语义通常保证对结构的互斥访问,因为只有一个accept给定时间内,任务中只有一个子句可以处于活动状态。唯一的例外是当任务嵌套在过程中时或其他任务。例如,如果定义共享数据结构的任务具有嵌套任务,则该嵌套任务也可以访问共享结构,这可能会破坏数据的完整性。因此,旨在控制对共享数据结构的访问的任务不应定义任务。
If access to a data structure is to be controlled by a task, then mutually exclusive access can be achieved by declaring the data structure within a task. The semantics of task execution usually guarantees mutually exclusive access to the structure, because only one accept clause in the task can be active at a given time. The only exceptions to this occur when tasks are nested in procedures or other tasks. For example, if a task that defines a shared data structure has a nested task, that nested task can also access the shared structure, which could destroy the integrity of the data. Thus, tasks that are meant to control access to a shared data structure should not define tasks.
下面是一个实现缓冲区监控器的 Ada 任务示例。缓冲区的行为与第 13.3节 中的缓冲区非常相似,其中同步由信号量控制。
The following is an example of an Ada task that implements a monitor for a buffer. The buffer behaves very much like the buffer in Section 13.3, in which synchronization is controlled with semaphores.
task Buf_Task is
entry Deposit(Item : in Integer);
entry Fetch(Item : out Integer);
end Buf_Task;
task body Buf_Task is
Bufsize : constant Integer := 100;
Buf : array (1..Bufsize) of Integer;
Filled : Integer range 0..Bufsize := 0;
Next_In,
Next_Out : Integer range 1..Bufsize := 1;
begin
loop
select
when Filled < Bufsize =>
accept Deposit(Item : in Integer) do
Buf(Next_In) := Item;
end Deposit;
Next_In := (Next_In mod Bufsize) + 1;
Filled := Filled + 1;
or
when Filled > 0 =>
accept Fetch(Item : out Integer) do
Item := Buf(Next_Out);
end Fetch;
Next_Out := (Next_Out mod Bufsize) + 1;
Filled := Filled - 1;
end select;
end loop;
end Buf_Task;
task Buf_Task is
entry Deposit(Item : in Integer);
entry Fetch(Item : out Integer);
end Buf_Task;
task body Buf_Task is
Bufsize : constant Integer := 100;
Buf : array (1..Bufsize) of Integer;
Filled : Integer range 0..Bufsize := 0;
Next_In,
Next_Out : Integer range 1..Bufsize := 1;
begin
loop
select
when Filled < Bufsize =>
accept Deposit(Item : in Integer) do
Buf(Next_In) := Item;
end Deposit;
Next_In := (Next_In mod Bufsize) + 1;
Filled := Filled + 1;
or
when Filled > 0 =>
accept Fetch(Item : out Integer) do
Item := Buf(Next_Out);
end Fetch;
Next_Out := (Next_Out mod Bufsize) + 1;
Filled := Filled - 1;
end select;
end loop;
end Buf_Task;
在此示例中,两个accept子句均已扩展。这些扩展子句可与调用关联子句的任务同时执行accept。
In this example, both accept clauses are extended. These extended clauses can be executed concurrently with the tasks that called the associated accept clauses.
生产者和消费者可以使用的任务Buf_Task具有以下形式:
The tasks for a producer and a consumer that could use Buf_Task have the following form:
task Producer;
task Consumer;
task body Producer is
New_Value : Integer;
begin
loop
-- produce New_Value --
Buf_Task.Deposit(New_Value);
end loop;
end Producer;
task body Consumer is
Stored_Value : Integer;
begin
loop
Buf_Task.Fetch(Stored_Value);
-- consume Stored_Value --
end loop;
end Consumer;
task Producer;
task Consumer;
task body Producer is
New_Value : Integer;
begin
loop
-- produce New_Value --
Buf_Task.Deposit(New_Value);
end loop;
end Producer;
task body Consumer is
Stored_Value : Integer;
begin
loop
Buf_Task.Fetch(Stored_Value);
-- consume Stored_Value --
end loop;
end Consumer;正如我们所见,可以通过将数据封装在任务中并仅允许通过任务条目进行访问来控制对共享数据的访问,这隐式地提供了竞争同步。这种方法的一个问题是难以有效地实现会合机制。Ada 95 受保护对象提供了一种提供竞争同步的替代方法,无需涉及会合机制。
As we have seen, access to shared data can be controlled by enclosing the data in a task and allowing access only through task entries, which implicitly provide competition synchronization. One problem with this method is that it is difficult to implement the rendezvous mechanism efficiently. Ada 95 protected objects provide an alternative method of providing competition synchronization that need not involve the rendezvous mechanism.
受保护对象不是任务;它更像是监视器,如第13.4节 所述。受保护对象可由受保护子程序或语法上类似于accept任务子句的条目访问。3受保护子程序可以是受保护过程,它们提供对受保护对象数据的互斥读写访问,也可以是受保护函数,它们提供对该数据的并发只读访问。条目与受保护子程序的不同之处在于它们可以有保护。
A protected object is not a task; it is more like a monitor, as described in Section 13.4. Protected objects can be accessed either by protected subprograms or by entries that are syntactically similar to the accept clauses in tasks.3 The protected subprograms can be either protected procedures, which provide mutually exclusive read-write access to the data of the protected object, or protected functions, which provide concurrent read-only access to that data. Entries differ from protected subprograms in that they can have guards.
在受保护的过程的主体内,封闭的受保护单元的当前实例被定义为一个变量;在受保护的函数的主体内,封闭的受保护单元的当前实例被定义为一个常量,允许并发只读访问。
Within the body of a protected procedure, the current instance of the enclosing protected unit is defined to be a variable; within the body of a protected function, the current instance of the enclosing protected unit is defined to be a constant, which allows concurrent read-only access.
对受保护对象的入口调用提供与使用同一受保护对象的一个或多个任务的同步通信。这些入口调用提供的访问类似于对任务中包含的数据的访问。
Entry calls to a protected object provide synchronous communication with one or more tasks using the same protected object. These entry calls provide access similar to that provided to the data enclosed in a task.
上一节中用任务解决的缓冲区问题可以用受保护的对象更简单地解决。请注意,此示例不包括受保护的子程序。
The buffer problem that is solved with a task in the previous subsection can be more simply solved with a protected object. Note that this example does not include protected subprograms.
protected Buffer is
entry Deposit(Item : in Integer);
entry Fetch(Item : out Integer);
private
Bufsize : constant Integer := 100;
Buf : array (1..Bufsize) of Integer;
Filled : Integer range 0..Bufsize := 0;
Next_In,
Next_Out : Integer range 1..Bufsize := 1;
end Buffer;
protected body Buffer is
entry Deposit(Item : in Integer)
when Filled < Bufsize is
begin
Buf(Next_In) := Item;
Next_In := (Next_In mod Bufsize) + 1;
Filled := Filled + 1;
end Deposit;
entry Fetch(Item : out Integer) when Filled > 0 is
begin Item := Buf(Next_Out);
Next_Out := (Next_Out mod Bufsize) + 1;
Filled := Filled - 1;
end Fetch;
end Buffer;
protected Buffer is
entry Deposit(Item : in Integer);
entry Fetch(Item : out Integer);
private
Bufsize : constant Integer := 100;
Buf : array (1..Bufsize) of Integer;
Filled : Integer range 0..Bufsize := 0;
Next_In,
Next_Out : Integer range 1..Bufsize := 1;
end Buffer;
protected body Buffer is
entry Deposit(Item : in Integer)
when Filled < Bufsize is
begin
Buf(Next_In) := Item;
Next_In := (Next_In mod Bufsize) + 1;
Filled := Filled + 1;
end Deposit;
entry Fetch(Item : out Integer) when Filled > 0 is
begin Item := Buf(Next_Out);
Next_Out := (Next_Out mod Bufsize) + 1;
Filled := Filled - 1;
end Fetch;
end Buffer;使用通用的并发消息传递模型来构建监视器就像使用 Ada 包来支持抽象数据类型一样 — 两者都是比必要更通用的工具。受保护的对象是提供对共享数据的同步访问的更好方法。
Using the general message-passing model of concurrency to construct monitors is like using Ada packages to support abstract data types—both are tools that are more general than is necessary. Protected objects are a better way to provide synchronized access to shared data.
在没有具有独立内存的分布式处理器的情况下,在并发环境中选择监视器还是使用消息传递作为实现对共享数据的同步访问的手段的任务,在某种程度上取决于个人喜好。然而,在 Ada 中,受保护的对象显然比任务更适合支持对共享数据的并发访问。不仅代码更简单,而且效率更高。
In the absence of distributed processors with independent memories, the choice between monitors and tasks with message passing as a means of implementing synchronized access to shared data in a concurrent environment is somewhat a matter of taste. However, in the case of Ada, protected objects are clearly better than tasks for supporting concurrent access to shared data. Not only is the code simpler; it is also much more efficient.
对于分布式系统,消息传递是更好的并发模型,因为它自然支持在单独的处理器上并行执行的单独进程的概念。
For distributed systems, message passing is a better model for concurrency, because it naturally supports the concept of separate processes executing in parallel on separate processors.
Java 中的并发单元是名为 的方法run,其代码可以与其他此类方法(其他对象的)和 方法并发执行main。方法执行的过程run称为线程。Java 的线程是轻量级任务,这意味着它们都在同一个地址空间中运行。这与 Ada 任务不同,后者是重量级线程(它们在自己的地址空间中运行)。4这种差异的一个重要结果是线程所需的开销远低于 Ada 的任务。
The concurrent units in Java are methods named run, whose code can be in concurrent execution with other such methods (of other objects) and with the main method. The process in which the run methods execute is called a thread. Java’s threads are lightweight tasks, which means that they all run in the same address space. This is different from Ada tasks, which are heavyweight threads (they run in their own address spaces).4 One important result of this difference is that threads require far less overhead than Ada’s tasks.
定义带有方法的类有两种方法run。其中之一是定义预定义类的子类Thread并重写其run方法。但是,如果新子类具有必需的自然父类,那么将其定义为的子类Thread显然行不通。在这些情况下,我们定义一个从其自然父类继承并实现接口的子类Runnable。Runnable提供run方法协议,因此任何实现的类Runnable都必须定义run。实现的类的对象Runnable将传递给Thread构造函数。因此,这种方法仍然需要一个对象,如第 13.7.5节Thread中的示例所示。
There are two ways to define a class with a run method. One of these is to define a subclass of the predefined class Thread and override its run method. However, if the new subclass has a necessary natural parent, then defining it as a subclass of Thread obviously will not work. In these situations, we define a subclass that inherits from its natural parent and implements the Runnable interface. Runnable provides the run method protocol, so any class that implements Runnable must define run. An object of the class that implements Runnable is passed to the Thread constructor. So, this approach still requires a Thread object, as will be seen in the example in Section 13.7.5.
在 Ada 中,任务可以是参与者或服务器,任务通过accept子句相互通信。Java方法都是参与者,除了方法(参见第13.7.1节)和通过共享数据之外,run没有其他机制供它们相互通信。join
In Ada, tasks can be either actors or servers and tasks communicate with each other through accept clauses. Java run methods are all actors and there is no mechanism for them to communicate with each other, except for the join method (see Section 13.7.1) and through shared data.
Java 线程是一个复杂的主题——本节仅介绍其最简单但最有用的部分。
Java threads is a complex topic—this section only provides an introduction to its simplest but most useful parts.
该类Thread不是任何其他类的自然父类。它为其子类提供一些服务,但与它们的计算目的没有任何自然关系。Thread是唯一可用于创建并发 Java 程序的类。如前所述,第13.7.5节 将简要讨论接口的使用Runnable。
The Thread class is not the natural parent of any other classes. It provides some services for its subclasses, but it is not related in any natural way to their computational purposes. Thread is the only class available for creating concurrent Java programs. As previously stated, Section 13.7.5 will briefly discuss the use of the Runnable interface.
类Thread包含五个构造函数以及一组方法和常量。run描述线程操作的方法始终被的子类重写Thread。的start方法Thread通过调用其方法将其线程作为并发单元启动run。5对的调用start不同寻常,因为控制权会立即返回给调用者,然后调用者继续执行,与新启动的run方法并行。
The Thread class includes five constructors and a collection of methods and constants. The run method, which describes the actions of the thread, is always overridden by subclasses of Thread. The start method of Thread starts its thread as a concurrent unit by calling its run method.5 The call to start is unusual in that control returns immediately to the caller, which then continues its execution, in parallel with the newly started run method.
以下是的骨架子类Thread和创建该子类的对象并run在新线程中启动方法执行的代码片段:
Following is a skeletal subclass of Thread and a code fragment that creates an object of the subclass and starts the run method’s execution in the new thread:
class MyThread extends Thread {
public void run() { . . . }
}
. . .
Thread myTh = new MyThread();
myTh.start();
class MyThread extends Thread {
public void run() { . . . }
}
. . .
Thread myTh = new MyThread();
myTh.start();
当 Java 应用程序开始执行时,会创建一个新线程(main方法将在该线程中运行)并main调用该线程。因此,所有 Java 应用程序都是在线程中运行的。
When a Java application program begins execution, a new thread is created (in which the main method will run) and main is called. Therefore, all Java application programs run in threads.
当程序有多个线程时,调度程序必须确定在给定时间运行哪个或哪些线程。在许多情况下,只有一个处理器可用,因此实际上一次只能运行一个线程。很难准确描述 Java 调度程序的工作原理,因为不同的实现(Solaris、Windows 等)不一定以完全相同的方式调度线程。但是,通常情况下,调度程序以循环方式为每个就绪线程提供相等大小的时间片,假设所有这些线程都具有相同的优先级。第13.7.2节 描述了如何为不同的线程赋予不同的优先级。
When a program has multiple threads, a scheduler must determine which thread or threads will run at any given time. In many cases, there is only a single processor available, so only one thread actually runs at a time. It is difficult to give a precise description of how the Java scheduler works, because the different implementations (Solaris, Windows, and so on) do not necessarily schedule threads in exactly the same way. Typically, however, the scheduler gives equal-size time slices to each ready thread in round-robin fashion, assuming all of these threads have the same priority. Section 13.7.2 describes how different priorities can be given to different threads.
该类Thread提供了几种控制线程执行的方法。该yield方法不带任何参数,是正在运行的线程主动交出处理器的请求。6该线程立即被放入任务就绪队列,准备运行。然后,调度程序从任务就绪队列中选择优先级最高的线程。如果没有其他优先级高于刚刚交出处理器的线程的就绪线程,那么它也可能是下一个获得处理器的线程。
The Thread class provides several methods for controlling the execution of threads. The yield method, which takes no parameters, is a request from the running thread to surrender the processor voluntarily.6 The thread is put immediately in the task-ready queue, making it ready to run. The scheduler then chooses the highest-priority thread from the task-ready queue. If there are no other ready threads with priority higher than the one that just yielded the processor, it may also be the next thread to get the processor.
该sleep方法有一个参数,即 的调用者sleep希望线程被阻塞的整数毫秒数。在经过指定的毫秒数之后,线程将被放入任务就绪队列中。由于无法知道线程在运行之前将在任务就绪队列中停留多长时间,因此参数 是线程不执行的sleep最短时间。该方法可以抛出,必须在调用 的方法中处理它。第14章详细介绍了异常。sleepInterruptedExceptionsleep
The sleep method has a single parameter, which is the integer number of milliseconds that the caller of sleep wants the thread to be blocked. After the specified number of milliseconds has passed, the thread will be put in the task-ready queue. Because there is no way to know how long a thread will be in the task-ready queue before it runs, the parameter to sleep is the minimum amount of time the thread will not be in execution. The sleep method can throw an InterruptedException, which must be handled in the method that calls sleep. Exceptions are described in detail in Chapter 14.
该join方法用于强制延迟执行一个方法,直到run另一个线程的方法完成执行。join当一个方法的处理无法继续,直到另一个线程的工作完成时使用。例如,我们可能有以下run方法:
The join method is used to force a method to delay its execution until the run method of another thread has completed its execution. join is used when the processing of a method cannot continue until the work of the other thread is complete. For example, we might have the following run method:
public void run() {
. . .
Thread myTh = new Thread();
myTh.start();
// do part of the computation of this thread
myTh.join(); // Wait for myTh to complete
// do the rest of the computation of this thread
}
public void run() {
. . .
Thread myTh = new Thread();
myTh.start();
// do part of the computation of this thread
myTh.join(); // Wait for myTh to complete
// do the rest of the computation of this thread
}
该join方法将调用它的线程置于阻塞状态,只有当执行它的线程完成后才能结束该状态。join。如果该线程恰好被阻塞,则可能会发生死锁。为了防止可以使用一个参数来调用此join方法,该参数是调用线程等待被调用线程完成的时间限制(以毫秒为单位)。例如,以下调用join将导致调用线程等待两秒钟才能myTh完成。如果两秒钟后它仍未完成执行,则将调用线程放回到就绪队列中,这意味着它将在调度后立即继续执行。
The join method puts the thread that calls it in the blocked state, which can be ended only by the completion of the thread on which join was called. If that thread happens to be blocked, there is the possibility of deadlock. To prevent this, join can be called with a parameter, which is the time limit in milliseconds of how long the calling thread will wait for the called thread to complete. For example, the following call to join will cause the calling thread to wait two seconds for myTh to complete. If it has not completed its execution after two seconds have passed, the calling thread is put back in the ready queue, which means that it will continue its execution as soon as it is scheduled.
myTh.join(2000);myTh.join(2000);
Java 的早期版本还包含另外三种Thread方法:stop、suspend和resume。由于安全问题,这三种方法均已弃用。stop有时会用一种简单方法覆盖 ,该方法通过将线程的引用变量设置为 来销毁线程null。
Early versions of Java included three more Thread methods: stop, suspend, and resume. All three of these have been deprecated because of safety problems. The stop method is sometimes overridden with a simple method that destroys the thread by setting its reference variable to null.
方法结束执行的正常方式run是到达代码末尾。但是,在许多情况下,线程会一直运行,直到被告知终止。关于这一点,存在一个问题,即线程如何确定它应该继续还是结束。 方法interrupt是向线程传达它应该停止的一种方式。此方法不会停止线程;相反,它向线程发送一条消息,该消息实际上只是在线程对象中设置一个位,线程可以检查该位。使用谓词方法检查该位。isInterrupted这不是一个完整的解决方案,因为试图中断的线程在调用该方法时可能正在休眠或等待interrupt,这意味着它不会检查它是否已被中断。对于这些情况,该interrupt方法还会引发异常,InterruptedException这也会导致线程被唤醒(从休眠或等待中唤醒)。因此,线程可以定期检查它是否已被中断,如果是,它是否可以终止。线程不会错过中断,因为如果中断发生时它处于休眠或等待状态,它将被中断唤醒。实际上,的操作和用途还有更多的细节interrupt,但这里就不一一介绍了(Arnold 等人,2006 年)。
The normal way a run method ends its execution is by reaching the end of its code. However, in many cases, threads run until told to terminate. Regarding this, there is the question of how a thread can determine whether it should continue or end. The interrupt method is one way to communicate to a thread that it should stop. This method does not stop the thread; rather, it sends the thread a message that actually just sets a bit in the thread object, which can be checked by the thread. The bit is checked with the predicate method, isInterrupted. This is not a complete solution, because the thread one is attempting to interrupt may be sleeping or waiting at the time the interrupt method is called, which means that it will not be checking to see if it has been interrupted. For these situations, the interrupt method also throws an exception, InterruptedException, which also causes the thread to awaken (from sleeping or waiting). So, a thread can periodically check to see whether it has been interrupted and if so, whether it can terminate. The thread cannot miss the interrupt, because if it was asleep or waiting when the interrupt occurred, it will be awakened by the interrupt. Actually, there are more details to the actions and uses of interrupt, but they are not covered here (Arnold et al., 2006).
线程的优先级不必全部相同。线程的默认优先级最初与创建它的线程相同。如果main创建线程,其默认优先级为常数NORM_PRIORITY,通常为 5。Thread定义了另外两个优先级常量MAX_PRIORITY和MIN_PRIORITY,其值通常分别为 10 和 1。7可以使用方法 更改线程的优先级。新优先级可以是任何预定义常量或和setPriority之间的任何其他数字。 该方法返回线程的当前优先级。优先级常量在 中定义。MIN_PRIORITYMAX_PRIORITYgetPriorityThread
The priorities of threads need not all be the same. A thread’s default priority initially is the same as the thread that created it. If main creates a thread, its default priority is the constant NORM_PRIORITY, which is usually 5. Thread defines two other priority constants, MAX_PRIORITY and MIN_PRIORITY, whose values are usually 10 and 1, respectively.7 The priority of a thread can be changed with the method setPriority. The new priority can be any of the predefined constants or any other number between MIN_PRIORITY and MAX_PRIORITY. The getPriority method returns the current priority of a thread. The priority constants are defined in Thread.
当存在具有不同优先级的线程时,调度程序的行为由这些优先级控制。当执行线程被阻止或终止或其时间片到期时,调度程序会从任务就绪队列中选择具有最高优先级的线程。只有当机会出现时,任务就绪队列中没有优先级较高的线程时,优先级较低的线程才会运行。
When there are threads with different priorities, the scheduler’s behavior is controlled by those priorities. When the executing thread is blocked or killed or the time slice for it expires, the scheduler chooses the thread from the task-ready queue that has the highest priority. A thread with lower priority will run only if one of higher priority is not in the task-ready queue when the opportunity arises.
该java.util.concurrent.Semaphore包定义了该类Semaphore。该类的对象实现计数信号量。计数信号量有一个计数器,但没有用于存储线程描述符的队列。该类Semaphore定义了两个方法,acquire和release,它们对应于第 13.3节中描述的wait和操作。release
The java.util.concurrent.Semaphore package defines the Semaphore class. Objects of this class implement counting semaphores. A counting semaphore has a counter, but no queue for storing thread descriptors. The Semaphore class defines two methods, acquire and release, which correspond to the wait and release operations described in Section 13.3.
的基本构造函数Semaphore采用一个整数参数,用于初始化信号量的计数器。例如,以下内容可用于初始化第13.3.2节缓冲区示例中的fullspots和信号量:emptyspots
The basic constructor for Semaphore takes one integer parameter, which initializes the semaphore’s counter. For example, the following could be used to initialize the fullspots and emptyspots semaphores for the buffer example of Section 13.3.2:
fullspots = new Semaphore(0);
emptyspots = new Semaphore(BUFLEN);
fullspots = new Semaphore(0);
emptyspots = new Semaphore(BUFLEN);
生产者方法的存款操作如下所示:
The deposit operation of the producer method would appear as follows:
emptyspots.acquire();
deposit(value);
fullspots.release();
emptyspots.acquire();
deposit(value);
fullspots.release();
同样,消费者方法的获取操作将如下所示:
Likewise, the fetch operation of the consumer method would appear as follows:
fullspots.acquire();
fetch(value);
emptyspots.release();
fullspots.acquire();
fetch(value);
emptyspots.release();
deposit和方法可以使用第13.7.4节fetch中的方法来提供访问缓冲区所需的竞争同步。
The deposit and fetch methods could use the approach used in Section 13.7.4 to provide the competition synchronization required for the accesses to the buffer.
Java 方法(但不是构造函数)可以指定为synchronized。通过特定对象调用的同步方法必须先完成其执行,然后其他同步方法才能在该对象上运行。对象上的竞争同步是通过指定访问共享数据的方法是同步的来实现的。同步机制的实现如下:每个 Java 对象都有一个锁。同步方法必须获取对象的锁才能执行,这会阻止其他同步方法在该对象上执行在此期间。同步方法在完成执行时会释放其运行所在的对象的锁,即使该完成是由于异常造成的。请考虑以下骨架类定义:
Java methods (but not constructors) can be specified to be synchronized. A synchronized method called through a specific object must complete its execution before any other synchronized method can run on that object. Competition synchronization on an object is implemented by specifying that the methods that access shared data are synchronized. The synchronized mechanism is implemented as follows: Every Java object has a lock. Synchronized methods must acquire the lock of the object before they are allowed to execute, which prevents other synchronized methods from executing on the object during that time. A synchronized method releases the lock on the object on which it runs when it completes its execution, even if that completion is due to an exception. Consider the following skeletal class definition:
class ManageBuf {
private int [100] buf;
. . .
public synchronized void deposit(int item) { . . . }
public synchronized int fetch() { . . . }
. . .
}
class ManageBuf {
private int [100] buf;
. . .
public synchronized void deposit(int item) { . . . }
public synchronized int fetch() { . . . }
. . .
}
中定义的两个方法ManageBuf均定义为synchronized,这可防止它们在由不同线程调用时在同一个对象上执行时互相干扰。
The two methods defined in ManageBuf are both defined to be synchronized, which prevents them from interfering with each other while executing on the same object, when they are called by separate threads.
所有方法都同步的对象实际上是一个监视器。请注意,一个对象可能有一个或多个同步方法,以及一个或多个非同步方法。非同步方法可以在任何时间在对象上运行,即使在执行同步方法期间也可以运行。
An object whose methods are all synchronized is effectively a monitor. Note that an object may have one or more synchronized methods, as well as one or more unsynchronized methods. An unsynchronized method can run on an object at anytime, even during the execution of a synchronized method.
在某些情况下,处理共享数据结构的语句数量明显少于其所在方法中的其他语句数量。在这些情况下,最好同步更改共享数据结构的代码段,而不是同步整个方法。这可以使用所谓的synchronized语句来完成,其一般形式如下:
In some cases, the number of statements that deal with the shared data structure is significantly less than the number of other statements in the method in which it resides. In these cases, it is better to synchronize the code segment that changes the shared data structure rather than the whole method. This can be done with a so-called synchronized statement, whose general form is as follows:
synchronized (expression){
statements
}
synchronized (expression){
statements
}
此代码中的表达式必须求值为一个对象,并且语句可以是单个语句或复合语句。在执行语句或复合语句期间,对象被锁定,因此语句或复合语句的执行方式与同步方法的主体完全相同。
The expression in this code must evaluate to an object and the statement can be a single statement or a compound statement. The object is locked during execution of the statement or compound statement, so the statement or compound statement is executed exactly as if it were the body of a synchronized method.
定义了同步方法的对象必须有一个与之关联的队列,该队列存储在另一个同步方法操作该对象时尝试在其上执行的同步方法。实际上,每个对象都有一个称为内在条件队列的队列。这些队列是隐式提供的。当同步方法在对象上完成执行时,如果存在在对象的内在条件队列中等待的方法,则将其放入任务就绪队列中。
An object that has synchronized methods defined for it must have a queue associated with it that stores the synchronized methods that have attempted to execute on it while it was being operated upon by another synchronized method. Actually, every object has a queue called the intrinsic condition queue. These queues are implicitly supplied. When a synchronized method completes its execution on an object, a method that is waiting in the object’s intrinsic condition queue, if there is such a method, is put in the task-ready queue.
Java 中的协作同步是通过wait、notify和notifyAll方法实现的,所有这些方法都定义在Object所有 Java 类的根类中。除 之外的所有类Object都继承这些方法。每个对象wait具有已调用该对象的所有线程的等待列表。调用该notify方法来通知一个等待线程它可能一直在等待的事件已经发生。notify无法确定被唤醒的具体线程,因为 Java 虚拟机 (JVM) 会从线程对象的等待列表中随机选择一个。正因为如此,再加上等待线程可能在等待不同的条件,因此notifyAll通常使用该方法,而不是notify。该notifyAll方法通过将对象等待列表中的所有线程放入任务就绪队列来唤醒它们。
Cooperation synchronization in Java is implemented with the wait, notify, and notifyAll methods, all of which are defined in Object, the root class of all Java classes. All classes except Object inherit these methods. Every object has a wait list of all of the threads that have called wait on the object. The notify method is called to tell one waiting thread that an event that it may have been waiting for has occurred. The specific thread that is awakened by notify cannot be determined, because the Java Virtual Machine (JVM) chooses one from the wait list of the thread object at random. Because of this, along with the fact that the waiting threads may be waiting for different conditions, the notifyAll method is often used, rather than notify. The notifyAll method awakens all of the threads on the object’s wait list by putting them in the task-ready queue.
方法wait、notify和notifyAll只能在同步方法中调用,因为它们使用同步方法放置在对象上的锁。对 的调用wait始终放在while由该方法所等待的条件控制的循环中。循环while是必需的,因为唤醒线程的notify或notifyAll可能是由于线程所等待的条件以外的条件发生变化而调用的。如果它是对 的调用notifyAll,则所等待的条件现在为真的可能性更小。由于使用了notifyAll,自上次测试以来,其他某个线程可能已将条件更改为假。
The methods wait, notify, and notifyAll can be called only from within a synchronized method, because they use the lock placed on an object by such a method. The call to wait is always put in a while loop that is controlled by the condition for which the method is waiting. The while loop is necessary because the notify or notifyAll that awakened the thread may have been called because of a change in a condition other than the one for which the thread was waiting. If it was a call to notifyAll, there is even a smaller chance that the condition being waited for is now true. Because of the use of notifyAll, some other thread may have changed the condition to false since it was last tested.
方法wait可以抛出InterruptedException,它是 的后代。Java 的异常处理在第 14章Exception中讨论。因此,任何调用 的代码也必须捕获。假设等待的条件是,则常规使用方式如下: waitInterruptedExceptiontheConditionwait
The wait method can throw InterruptedException, which is a descendant of Exception. Java’s exception handling is discussed in Chapter 14. Therefore, any code that calls wait must also catch InterruptedException. Assuming the condition being waited for is called theCondition, the conventional way to use wait is as follows:
try {
while (!theCondition)
wait();
-- Do whatever is needed after theCondition comes true
}
catch(InterruptedException myProblem) { . . . }
try {
while (!theCondition)
wait();
-- Do whatever is needed after theCondition comes true
}
catch(InterruptedException myProblem) { . . . }
以下程序实现了一个用于存储int值的循环队列。它说明了合作和竞争同步。
The following program implements a circular queue for storing int values. It illustrates both cooperation and competition synchronization.
// Queue
// This class implements a circular queue for storing int
// values. It includes a constructor for allocating and
// initializing the queue to a specified size. It has
// synchronized methods for inserting values into and
// removing values from the queue.
class Queue {
private int [] que;
private int nextIn,
nextOut,
filled,
queSize;
public Queue(int size) {
que = new int [size];
filled = 0;
nextIn = 1;
nextOut = 1;
queSize = size;
} //** end of Queue constructor
public synchronized void deposit (int item)
throws InterruptedException {
try {
while (filled == queSize)
wait();
que [nextIn] = item;
nextIn = (nextIn % queSize) + 1;
filled++;
notifyAll();
} //** end of try clause
catch(InterruptedException e) {}
} //** end of deposit method
public synchronized int fetch()
throws InterruptedException {
int item = 0;
try {
while (filled == 0)
wait();
item = que [nextOut];
nextOut = (nextOut % queSize) + 1;
filled--;
notifyAll();
} //** end of try clause
catch(InterruptedException e) {}
return item;
} //** end of fetch method
} //** end of Queue class
// Queue
// This class implements a circular queue for storing int
// values. It includes a constructor for allocating and
// initializing the queue to a specified size. It has
// synchronized methods for inserting values into and
// removing values from the queue.
class Queue {
private int [] que;
private int nextIn,
nextOut,
filled,
queSize;
public Queue(int size) {
que = new int [size];
filled = 0;
nextIn = 1;
nextOut = 1;
queSize = size;
} //** end of Queue constructor
public synchronized void deposit (int item)
throws InterruptedException {
try {
while (filled == queSize)
wait();
que [nextIn] = item;
nextIn = (nextIn % queSize) + 1;
filled++;
notifyAll();
} //** end of try clause
catch(InterruptedException e) {}
} //** end of deposit method
public synchronized int fetch()
throws InterruptedException {
int item = 0;
try {
while (filled == 0)
wait();
item = que [nextOut];
nextOut = (nextOut % queSize) + 1;
filled--;
notifyAll();
} //** end of try clause
catch(InterruptedException e) {}
return item;
} //** end of fetch method
} //** end of Queue class
请注意,异常处理程序(catch)在这里不执行任何操作。
Notice that the exception handler (catch) does nothing here.
定义可以使用该类的生产者和消费者对象的类Queue可以定义如下:
Classes to define producer and consumer objects that could use the Queue class can be defined as follows:
class Producer extends Thread {
private Queue buffer;
public Producer(Queue que) {
buffer = que;
}
public void run() {
int new_item;
while (true) {
//-- Create a new_item
buffer.deposit(new_item);
}
}
}
class Consumer extends Thread {
private Queue buffer;
public Consumer(Queue que) {
buffer = que;
}
public void run() {
int stored_item;
while (true) {
stored_item = buffer.fetch();
//-- Consume the stored_item
}
}
}
class Producer extends Thread {
private Queue buffer;
public Producer(Queue que) {
buffer = que;
}
public void run() {
int new_item;
while (true) {
//-- Create a new_item
buffer.deposit(new_item);
}
}
}
class Consumer extends Thread {
private Queue buffer;
public Consumer(Queue que) {
buffer = que;
}
public void run() {
int stored_item;
while (true) {
stored_item = buffer.fetch();
//-- Consume the stored_item
}
}
}
以下代码创建一个Queue对象,以及一个对象Producer和Consumer一个附加到该Queue对象的对象,并开始它们的执行:
The following code creates a Queue object, and a Producer and a Consumer object, both attached to the Queue object, and starts their execution:
Queue buff1 = new Queue(100);
Producer producer1 = new Producer(buff1);
Consumer consumer1 = new Consumer(buff1);
producer1.start();
consumer1.start();
Queue buff1 = new Queue(100);
Producer producer1 = new Producer(buff1);
Consumer consumer1 = new Consumer(buff1);
producer1.start();
consumer1.start();
我们可以将Producer和定义为接口Consumer的实现,Runnable而不是的子类Thread。唯一的区别在于第一行,现在看起来如下:
We could define one or both of the Producer and the Consumer as implementations of the Runnable interface rather than as subclasses of Thread. The only difference is in the first line, which would now appear as follows:
class Producer implements Runnable { . . . }class Producer implements Runnable { . . . }
要创建并运行此类的对象,仍然需要创建一个Thread与该对象连接的对象。以下代码说明了这一点:
To create and run an object of such a class, it is still necessary to create a Thread object that is connected to the object. This is illustrated in the following code:
Producer producer1 = new Producer(buff1);
Thread producerThread = new Thread(producer1);
producerThread.start();
Producer producer1 = new Producer(buff1);
Thread producerThread = new Thread(producer1);
producerThread.start();
注意,缓冲区对象被传递给Producer构造函数,Producer对象也被传递给Thread构造函数。
Note that the buffer object is passed to the Producer constructor and the Producer object is passed to the Thread constructor.
Java 包含一些用于控制对某些变量的访问的类,这些访问不包括阻塞或等待。包中定义的类允许对、和基本类型变量以及引用和数组进行java.util.concurrent.atomic某些非阻塞同步访问。例如,类定义 getter 和 setter 方法,以及用于添加、增量和减量操作的方法。这些操作都是原子的;也就是说,它们不能被中断,因此不需要锁来保证多线程程序中受影响变量的值的完整性。这是细粒度的同步 - 只是一个变量。现在大多数机器都有针对和类型执行这些操作的原子指令,因此它们通常很容易实现(不需要隐式锁)。intlongbooleanAtomicIntegerintlong
Java includes some classes for controlling accesses to certain variables that do not include blocking or waiting. The java.util.concurrent.atomic package defines classes that allow certain nonblocking synchronized access to int, long, and boolean primitive type variables, as well as references and arrays. For example, the AtomicInteger class defines getter and setter methods, as well as methods for add, increment, and decrement operations. These operations are all atomic; that is, they cannot be interrupted, so locks are not required to guarantee the integrity of the values of the affected variables in a multithreaded program. This is fine-grained synchronization—just a single variable. Most machines now have atomic instructions for these operations on int and long types, so they are often easy to implement (implicit locks are not required).
非阻塞同步的优点是效率。争用期间不发生的非阻塞访问不会慢,而且通常比使用 的访问更快synchronized。争用期间发生的非阻塞访问肯定会比使用 的访问更快synchronized,因为后者需要暂停和重新安排线程。
The advantage of nonblocking synchronization is efficiency. A nonblocking access that does not occur during contention will be no slower, and usually faster than one that uses synchronized. A nonblocking access that occurs during contention definitely will be faster than one that uses synchronized, because the latter will require suspension and rescheduling of threads.
synchronizedJava 5.0 引入了显式锁作为方法和块的替代方案,后者提供隐式锁。Lock接口声明了lock、unlock和tryLock方法。预定义ReentrantLock类实现了Lock接口。要锁定代码块,可以使用以下习惯用法:
Java 5.0 introduced explicit locks as an alternative to synchronized method and blocks, which provide implicit locks. The Lock interface declares the lock, unlock, and tryLock methods. The predefined ReentrantLock class implements the Lock interface. To lock a block of code, the following idiom can be used:
Lock lock = new ReentrantLock();
. . .
Lock.lock();
try {
// The code that accesses the shared data
} finally {
Lock.unlock();
}
Lock lock = new ReentrantLock();
. . .
Lock.lock();
try {
// The code that accesses the shared data
} finally {
Lock.unlock();
}
此骨架代码创建一个Lock对象并调用lock该对象上的方法Lock。然后,它使用一个try块来封装关键代码。对unlockis 的调用位于一个finally子句中,以确保无论try块中发生什么,锁都会被释放。
This skeletal code creates a Lock object and calls the lock method on the Lock object. Then, it uses a try block to enclose the critical code. The call to unlock is in a finally clause to guarantee the lock is released, regardless of what happens in the try block.
至少有两种情况会使用显式锁而不是隐式锁:首先,如果应用程序需要尝试获取锁但不能永远等待,则接口Lock包含一个方法 ,tryLock该方法接受一个时间限制参数。如果未在时间限制内获取锁,则继续执行调用 之后的语句tryLock。其次,当不方便将锁定-解锁对块结构化时,使用显式锁定。隐式锁定始终在锁定它们的复合语句的末尾解锁。显式锁定可以在代码中的任何位置解锁,无论程序的结构如何。
There are at least two situations in which explicit locks are used rather than implicit locks: First, if the application needs to try to acquire a lock but cannot wait forever for it, the Lock interface includes a method, tryLock, that takes a time limit parameter. If the lock is not acquired within the time limit, execution continues at the statement following the call to tryLock. Second, explicit locks are used when it is not convenient to have the lock-unlock pairs block structured. Implicit locks are always unlocked at the end of the compound statement in which they are locked. Explicit locks can be unlocked anywhere in the code, regardless of the structure of the program.
使用显式锁的一个危险(使用隐式锁则不存在这种情况)是忽略解锁。隐式锁在锁定块结束时隐式解锁。但是,显式锁会保持锁定状态,直到显式解锁,这可能永远不会发生。
One danger of using explicit locks (and is not the case with using implicit locks) is that of omitting the unlock. Implicit locks are implicitly unlocked at the end of the locked block. However, explicit locks stay locked until explicitly unlocked, which can potentially be never.
如前所述,每个对象都有一个内在条件队列,其中存储了等待对象条件的线程。wait、notify和notifyAll方法是内在条件队列的 API。因为每个对象只能有一个条件队列,所以一个队列中可能有等待不同条件的线程。例如,我们的缓冲区示例的队列可能有等待Queue两个条件(filled == queSize或filled == 0)中的任一个的线程。这就是缓冲区使用 的原因notifyAll。(如果使用notify,则只会唤醒一个线程,而且该线程可能正在等待与实际变为真实的条件不同的条件。)但是,notifyAll使用 的成本很高,因为它会唤醒所有等待某个对象的线程,并且所有线程都必须检查其条件以确定哪个线程在运行。此外,要检查其条件,它们必须首先获取对象上的锁。
As stated previously, each object has an intrinsic condition queue, which stores threads waiting for a condition on the object. The wait, notify, and notifyAll methods are the API for an intrinsic condition queue. Because each object can have just one condition queue, a queue may have threads in it waiting for different conditions. For example, the queue for our buffer example Queue can have threads waiting for either of two conditions (filled == queSize or filled == 0). That is the reason why the buffer uses notifyAll. (If it used notify, only one thread would be awakened, and it might be one that was waiting for a different condition than the one that actually became true.) However, notifyAll is expensive to use, because it awakens all threads waiting on an object and all must check their condition to determine which one runs. Furthermore, to check their condition, they must first acquire the lock on the object.
使用内在条件队列的替代方法是Condition接口,它使用与对象关联的条件队列Lock。它还声明了wait、和 的替代方案,名为 、 和notify。notifyAll可以await有任意数量的对象和一个对象。可以使用、而不是,这既更容易理解又更高效,部分原因是它可以减少上下文切换。signalsignalAllConditionLockConditionsignalsignalAll
An alternative to using the intrinsic condition queue is the Condition interface, which uses a condition queue associated with a Lock object. It also declares alternatives to wait, notify, and notifyAll named await, signal, and signalAll. There can be any number of Condition objects with one Lock object. With Condition, signal, rather than signalAll, can be used, which is both easier to understand and more efficient, in part because it results in fewer context switches.
Java 对并发的支持相对简单但有效。所有 Javarun方法都是参与者任务,并且没有通信机制,除非通过共享数据,就像 Ada 任务之间一样。由于它们是重量级线程,Ada 的任务很容易被分配到不同的处理器;特别是具有不同内存的不同处理器,这些处理器可能位于不同位置的不同计算机上。这些类型的系统在 Java 的线程中是不可能实现的。
Java’s support for concurrency is relatively simple but effective. All Java run methods are actor tasks and there is no mechanism for communication, except through shared data, as there is among Ada tasks. Because they are heavyweight threads, Ada’s tasks easily can be distributed to different processors; in particular, different processors with different memories, which could be on different computers in different places. These kinds of systems are not possible with Java’s threads.
尽管 C# 的线程大致基于 Java 的线程,但仍存在很大差异。以下是对 C# 线程的简要概述。
Although C#’s threads are loosely based on those of Java, there are significant differences. Following is a brief overview of C#’s threads.
与 Java 中名为 的方法不同run,C# 中的任何方法都可以在自己的线程中运行。创建 C# 线程时,它们与预定义委托 的实例相关联。ThreadStart启动线程执行时,其委托具有要运行的方法的地址。因此,线程的执行通过其关联的委托进行控制。
Rather than just methods named run, as in Java, any C# method can run in its own thread. When C# threads are created, they are associated with an instance of a predefined delegate, ThreadStart. When execution of a thread is started, its delegate has the address of the method it is supposed to run. So, execution of a thread is controlled through its associated delegate.
C# 线程是通过创建Thread对象来创建的。Thread必须向构造函数发送一个实例ThreadStart,并向其发送要在线程中运行的方法的名称。例如,我们可能有
A C# thread is created by creating a Thread object. The Thread constructor must be sent an instantiation of ThreadStart, to which must be sent the name of the method that is to run in the thread. For example, we might have
public void MyRun1() { . . . }
. . .
Thread myThread = new Thread(new ThreadStart(MyRun1));
public void MyRun1() { . . . }
. . .
Thread myThread = new Thread(new ThreadStart(MyRun1));
在此示例中,我们创建一个名为 的线程myThread,其委托指向方法MyRun1。因此,当线程开始执行时,它会调用其委托中地址的方法。在此示例中,myThread是委托,MyRun1是方法。
In this example, we create a thread named myThread, whose delegate points to the method MyRun1. So, when the thread begins execution it calls the method whose address is in its delegate. In this example, myThread is the delegate and MyRun1 is the method.
与 Java 中所有线程都是参与者不同,C# 有两种线程类别:参与者和服务器。参与者线程不会被特别调用;相反,它们会被启动。此外,它们执行的方法不接受参数或返回值。与 Java 一样,创建线程不会启动其并发执行。对于参与者线程,必须通过类的方法(Thread在本例中名为 )请求执行Start,如下所示
Unlike Java, in which all threads are actors, C# has two categories of threads: actors and servers. Actor threads are not called specifically; rather, they are started. Also, the methods that they execute do not take parameters or return values. As with Java, creating a thread does not start its concurrent execution. For actor threads, execution must be requested through a method of the Thread class, in this case named Start, as in
myThread.Start();myThread.Start();
与 Java 一样,可以使用类似命名的方法让一个线程等待另一个线程完成执行后再继续Join。例如,假设线程A有以下调用:
As in Java, a thread can be made to wait for another thread to finish its execution before continuing, using the similarly named method Join. For example, suppose thread A has the following call:
B.Join();B.Join();
线程A将被阻塞,直到线程B退出。
Thread A will be blocked until thread B exits.
该Join方法可以采用一个int参数,该参数指定调用者等待线程完成的时间限制(以毫秒为单位)。
The Join method can take an int parameter, which specifies a time limit in milliseconds that the caller will wait for the thread to finish.
可以使用 暂停线程一段指定的时间Sleep,这是 的公共静态方法Thread。参数Sleep是整数毫秒数。与 Java 中的方法不同,C# 的方法Sleep不会引发任何异常,因此无需在try块中调用。
A thread can be suspended for a specified amount of time with Sleep, which is a public static method of Thread. The parameter to Sleep is an integer number of milliseconds. Unlike its Java relative, C#’s Sleep does not raise any exceptions, so it need not be called in a try block.
可以使用 方法来终止线程Abort,尽管它实际上并没有杀死线程。相反,它抛出ThreadAbortException,线程可以捕获它。当线程捕获此异常时,它通常会释放它分配的所有资源,然后结束(通过到达其代码的末尾)。
A thread can be terminated with the Abort method, although it does not literally kill the thread. Instead, it throws ThreadAbortException, which the thread can catch. When the thread catches this exception, it usually deallocates any resources it allocated, and then ends (by getting to the end of its code).
服务器线程仅在通过其委托调用时运行。这些线程之所以被称为服务器,是因为它们在请求时提供某些服务。服务器线程比参与者线程更有趣,因为它们通常与其他线程交互,并且通常必须与其他线程同步执行。
A server thread runs only when called through its delegate. These threads are called servers because they provide some service when it is requested. Server threads are more interesting than actor threads because they usually interact with other threads and often must have their execution synchronized with other threads.
回想一下第9章 ,任何 C# 方法都可以通过委托间接调用。可以通过将委托对象视为方法的名称来进行此类调用。这实际上是对名为的委托方法的调用的缩写。Invoke因此,如果委托对象的名称为chgfun1,并且它引用的方法采用一个int参数,我们可以使用以下任一语句调用该方法:
Recall from Chapter 9, that any C# method can be called indirectly through a delegate. Such calls can be made by treating the delegate object as if it were the name of the method. This was actually an abbreviation for a call to a delegate method named Invoke. So, if a delegate object’s name is chgfun1 and the method it references takes one int parameter, we could call that method with either of the following statements:
chgfun1(7);
chgfun1.Invoke(7);
chgfun1(7);
chgfun1.Invoke(7);
这些调用是同步的;也就是说,当调用方法时,调用者会被阻塞,直到方法执行完毕。C# 还支持异步调用在线程中执行的方法。当异步调用线程时,被调用线程和调用者线程会并发执行,因为调用者在被调用线程执行期间不会被阻塞。
These calls are synchronous; that is, when the method is called, the caller is blocked until the method completes its execution. C# also supports asynchronous calls to methods that execute in threads. When a thread is called asynchronously, the called thread and the caller thread execute concurrently, because the caller is not blocked during the execution of the called thread.
通过委托实例方法 异步调用线程BeginInvoke,将委托方法的参数以及两个附加参数(一个类型为AsyncCallback,另一个类型为 )发送到该方法object。BeginInvoke返回实现接口的对象IAsyncResult。委托类还定义了EndInvoke实例方法,该方法接受一个类型为 的参数IAsyncResult并返回与委托对象中封装的方法返回的类型相同的类型。要异步调用线程,我们使用 来调用它BeginInvoke。现在,我们将使用null最后两个参数。假设我们有以下方法声明和线程定义:
A thread is called asynchronously through the delegate instance method BeginInvoke, to which are sent the parameters for the method of the delegate, along with two additional parameters, one of type AsyncCallback and the other of type object. BeginInvoke returns an object that implements the IAsyncResult interface. The delegate class also defines the EndInvoke instance method, which takes one parameter of type IAsyncResult and returns the same type that is returned by the method encapsulated in the delegate object. To call a thread asynchronously, we call it with BeginInvoke. For now, we will use null for the last two parameters. Suppose we have the following method declaration and thread definition:
public float MyMethod1(int x);
. . .
Thread myThread = new Thread(new ThreadStart(MyMethod1));
public float MyMethod1(int x);
. . .
Thread myThread = new Thread(new ThreadStart(MyMethod1));
以下语句MyMethod异步调用:
The following statement calls MyMethod asynchronously:
IAsyncResult result = myThread.BeginInvoke(10, null, null);IAsyncResult result = myThread.BeginInvoke(10, null, null);
被调用线程的返回值通过EndInvoke方法获取,该方法以 返回的对象(类型IAsyncResult)为参数BeginInvoke。EndInvoke返回被调用线程的返回值。例如,要获取 的float调用结果MyMethod,我们将使用以下语句:
The return value of the called thread is fetched with EndInvoke method, which takes as its parameter the object (of type IAsyncResult) returned by BeginInvoke. EndInvoke returns the return value of the called thread. For example, to get the float result of the call to MyMethod, we would use the following statement:
float returnValue = EndInvoke(result);float returnValue = EndInvoke(result);
如果调用者必须在被调用线程执行时继续执行某些工作,则必须有一种方法来确定被调用线程何时完成。为此,接口IAsyncResult定义IsCompleted属性。在被调用线程执行时,调用者可以在while依赖于的循环中包含可以执行的代码IsCompleted。例如,我们可以有以下内容:
If the caller must continue some work while the called thread executes, it must have a way to determine when the called thread is finished. For this, the IAsyncResult interface defines the IsCompleted property. While the called thread is executing, the caller can include code it can execute in a while loop that depends on IsCompleted. For example, we could have the following:
IAsyncResult result = myThread.BeginInvoke(10, null, null);
while(!result.IsCompleted) {
// Do some computation
}
IAsyncResult result = myThread.BeginInvoke(10, null, null);
while(!result.IsCompleted) {
// Do some computation
}
这是在等待被调用线程完成其工作的同时在调用线程中完成某项工作的有效方法。但是,如果循环中的计算量while相对较小,则这是一种利用时间的低效方法(因为需要时间来测试IsCompleted)。另一种方法是向被调用线程提供一个带有回调方法地址的委托,并让其在完成时调用该方法。委托作为倒数第二个参数发送给BeginInvoke。例如,考虑以下对 的调用BeginInvoke:
This is an effective way to accomplish something in the calling thread while waiting for the called thread to complete its work. However, if the amount of computation in the while loop is relatively small, this is an inefficient way to use that time (because of the time required to test IsCompleted). An alternative is to give the called thread a delegate with the address of a callback method and have it call that method when it is finished. The delegate is sent as the second last parameter to BeginInvoke. For example, consider the following call to BeginInvoke:
IAsyncResult result = myThread.BeginInvoke(10,
new AsyncCallback(MyMethodComplete), null);IAsyncResult result = myThread.BeginInvoke(10,
new AsyncCallback(MyMethodComplete), null);
回调方法在调用者中定义。此类方法通常只是将一个布尔变量(例如名为 )设置isDone为true。无论调用线程花费多长时间,回调方法都只会调用一次。
The callback method is defined in the caller. Such methods often simply set a Boolean variable, for example named isDone, to true. No matter how long the called thread takes, the callback method is called only once.
C# 线程可以通过三种不同的方式同步:Interlocked类、命名空间Monitor中的类System.Threading和lock语句。每种机制都是为特定需求而设计的。Interlocked当唯一需要同步的操作是整数的递增和递减时,使用类。这些操作通过 和 两个方法以原子方式完成,Interlocked这两个方法都以整数的引用作为参数。例如,要增加线程中命名的共享整数,我们可以使用IncrementDecrementcounter
There are three different ways that C# threads can be synchronized: the Interlocked class, the Monitor class from the System.Threading namespace, and the lock statement. Each of these mechanisms is designed for a specific need. The Interlocked class is used when the only operations that need to be synchronized are the incrementing and decrementing of an integer. These operations are done atomically with the two methods of Interlocked, Increment and Decrement, both of which take a reference to an integer as the parameter. For example, to increment a shared integer named counter in a thread, we could use
Interlocked.Increment(ref counter);Interlocked.Increment(ref counter);
该lock语句用于标记线程中代码的临界区。其语法如下:
The lock statement is used to mark a critical section of code in a thread. The syntax of this is as follows:
lock(token) {
// The critical section
}
lock(token) {
// The critical section
}
如果要同步的代码位于私有实例方法中,则标记是当前对象,因此this用作 的标记lock。如果要同步的代码位于公共实例方法中,则创建 的新实例object(在包含要同步的代码的方法的类中),并将对它的引用用作 的标记lock。
If the code to be synchronized is in a private instance method, the token is the current object, so this is used as the token for lock. If the code to be synchronized is in a public instance method, a new instance of object is created (in the class of the method with the code to be synchronized) and a reference to it is used as the token for lock.
该类Monitor定义了五种方法,Enter、Wait、Pulse、PulseAll和Exit,这些方法可用于提供对线程同步的更多控制。Enter方法以对象引用为参数,标志着该对象上线程同步的开始。 该Wait方法暂停线程的执行并指示 .NET 的公共语言运行时 (CLR),此线程希望在下次有机会时恢复执行。 方法Pulse也以对象引用为参数,它通知一个等待线程它现在有机会再次运行。PulseAll与 Java 的 类似notifyAll。一直等待的线程按照它们调用该Wait方法的顺序运行。 该Exit方法结束线程的临界区。
The Monitor class defines five methods, Enter, Wait, Pulse, PulseAll, and Exit, which can be used to provide more control of the synchronization of threads. The Enter method, which takes an object reference as its parameter, marks the beginning of synchronization of the thread on that object. The Wait method suspends execution of the thread and instructs the Common Language Runtime (CLR) of .NET that this thread wants to resume its execution the next time there is an opportunity. The Pulse method, which also takes an object reference as its parameter, notifies one waiting thread that it now has a chance to run again. PulseAll is similar to Java’s notifyAll. Threads that have been waiting are run in the order in which they called the Wait method. The Exit method ends the critical section of the thread.
该lock语句被编译成监视器,因此是监视器的简写。当需要附加控制(例如,使用和)lock时,使用监视器。Wait PulseAll
The lock statement is compiled into a monitor, so lock is shorthand for a monitor. A monitor is used when the additional control (for example, with Wait and PulseAll) is needed.
.NET 4.0 添加了通用并发数据结构集合,包括队列、堆栈和包的结构。8这些新类是线程安全的,这意味着它们可以在多线程程序中使用,而无需程序员担心竞争同步。System.Collections.Concurrent命名空间定义了这些类,其名称为ConcurrentQueue<T>、ConcurrentStack<T>和ConcurrentBag<T>。因此,我们的生产者-消费者队列程序可以用 C# 编写,并使用ConcurrentQueue<T>数据结构,无需为其编写竞争同步程序。由于这些并发集合是在 .NET 中定义的,因此它们也可以在所有其他 .NET 语言中使用。
.NET 4.0 added a collection of generic concurrent data structures, including structures for queues, stacks, and bags.8 These new classes are thread safe, meaning that they can be used in a multithreaded program without requiring the programmer to worry about competition synchronization. The System.Collections.Concurrent namespace defines these classes, whose names are ConcurrentQueue<T>, ConcurrentStack<T>, and ConcurrentBag<T>. So, our producer-consumer queue program could be written in C# using a ConcurrentQueue<T> for the data structure and there would be no need to program the competition synchronization for it. Because these concurrent collections are defined in .NET, they are also available in all of the other .NET languages.
C# 的线程比其前身 Java 的线程略有改进。首先,任何方法都可以在其自己的线程中运行。回想一下,在 Java 中,只有命名的方法run可以在其自己的线程中运行。Java 仅支持参与者线程,但 C# 同时支持参与者线程和服务器线程。使用 C# 终止线程也更干净(调用方法 ( Abort) 比将线程的指针设置为 更优雅null)。在 C# 中,线程执行的同步更为复杂,因为 C# 有几种不同的机制,每种机制都适用于特定的应用程序。Java 的Lock变量类似于 C# 的锁,但在 Java 中,必须通过调用 明确解锁锁unlock。这提供了另一种创建错误代码的方法。C# 线程与 Java 的线程一样,都是轻量级的,因此尽管它们效率更高,但它们不能像像 Ada 的任务一样灵活。并发集合类的可用性是 C# 相对于本章讨论的其他非函数式语言的另一个优势。
C#’s threads are a slight improvement over those of its predecessor, Java. For one thing, any method can be run in its own thread. Recall that in Java, only methods named run can run in their own threads. Java supports actor threads only, but C# supports both actor and server threads. Thread termination is also cleaner with C# (calling a method (Abort) is more elegant than setting the thread’s pointer to null). Synchronization of thread execution is more sophisticated in C#, because C# has several different mechanisms, each for a specific application. Java’s Lock variables are similar to the locks of C#, except that in Java, a lock must be explicitly unlocked with a call to unlock. This provides one more way to create erroneous code. C# threads, like those of Java, are lightweight, so although they are more efficient, they cannot be as versatile as Ada’s tasks. The availability of the concurrent collection classes is another advantage C# has over the other nonfunctional languages discussed in this chapter.
本节简要概述了几种函数式编程语言对并发的支持。
This section provides a brief overview of support for concurrency in several functional programming languages.
Multi-LISP(Halstead,1985)是 Scheme 的一个扩展,它允许程序员指定可以并发执行的程序部分。这些并发形式是隐式的;程序员只是告诉编译器(或解释器)程序中可以并发运行的某些部分。
Multi-LISP (Halstead, 1985) is an extension to Scheme that allows the programmer to specify program parts that can be executed concurrently. These forms of concurrency are implicit; the programmer is simply telling the compiler (or interpreter) some parts of the program that can be run concurrently.
程序员可以告知系统可能的并发性的方法之一是构造pcall。如果函数调用嵌入在pcall构造中,则可以同时评估函数的参数。例如,考虑以下pcall构造:
One of the ways a programmer can tell the system about possible concurrency is the pcall construct. If a function call is embedded in a pcall construct, the parameters to the function can be evaluated concurrently. For example, consider the following pcall construct:
(pcall f a b c d)(pcall f a b c d)
函数为f,具有参数a、b、c和d。 的效果pcall是可以同时评估函数的参数(任何或所有参数都可以是复杂的表达式)。不幸的是,这个过程是否可以安全地使用,即不影响函数评估的语义,是程序员的责任。如果语言不允许副作用,或者程序员将函数设计为没有副作用或至少限制副作用,这实际上是一个简单的问题。但是,Multi-LISP 确实允许一些副作用。如果函数不是为避免副作用而编写的,程序员可能很难确定是否pcall可以安全地使用。
The function is f, with parameters a, b, c, and d. The effect of pcall is that the parameters of the function can be evaluated concurrently (any or all of the parameters could be complicated expressions). Unfortunately, whether this process can be safely used, that is, without affecting the semantics of the function evaluation, is the responsibility of the programmer. This is actually a simple matter if the language does not allow side effects or if the programmer designed the function not to have side effects or at least to have limited ones. However, Multi-LISP does allow some side effects. If the function was not written to avoid side effects, it may be difficult for the programmer to determine whether pcall can be safely used.
Multi-LISP 的构造future是一种更有趣且可能更高效的并发源。与 一样pcall,函数调用被包装在future构造中。此类函数在单独的线程中评估,父线程继续执行。父线程继续执行,直到需要使用函数的返回值。如果在需要函数结果时函数尚未完成执行,则父线程将等待函数完成执行后再继续。
The future construct of Multi-LISP is a more interesting and potentially more productive source of concurrency. As with pcall, a function call is wrapped in a future construct. Such a function is evaluated in a separate thread, with the parent thread continuing its execution. The parent thread continues until it needs to use the return value of the function. If the function has not completed its execution when its result is needed, the parent thread waits until it has before it continues.
如果一个函数有两个或更多参数,它们也可以被包装在future构造中,在这种情况下,它们的评估可以在单独的线程中同时完成。
If a function has two or more parameters, they can also be wrapped in future constructs, in which case their evaluations can be done concurrently in separate threads.
这些是 Multi-LISP 中 Scheme 的唯一附加功能。
These are the only additions to Scheme in Multi-LISP.
并发 ML (CML) 是 ML 的扩展,它包括一种线程形式和一种同步消息传递形式以支持并发。该语言在Reppy (1999)中有完整描述。
Concurrent ML (CML) is an extension to ML that includes a form of threads and a form of synchronous message passing to support concurrency. The language is completely described in Reppy (1999).
在 CML 中,使用原语创建一个线程spawn,该原语将函数作为其参数。在许多情况下,该函数被指定为匿名函数。一旦创建线程,该函数就会在新线程中开始执行。函数的返回值将被丢弃。函数的效果要么是输出产生的,要么是通过与其他线程的通信产生的。父线程(产生新线程的线程)或子线程(新线程)可以先终止,并且不会影响另一个线程的执行。
A thread is created in CML with the spawn primitive, which takes the function as its parameter. In many cases, the function is specified as an anonymous function. As soon as the thread is created, the function begins its execution in the new thread. The return value of the function is discarded. The effects of the function are either output produced or through communications with other threads. Either the parent thread (the one that spawned the new thread) or the child thread (the new one) could terminate first and it would not affect the execution of the other.
通道提供了线程间通信的手段。通道由channel构造函数创建。例如,以下语句创建一个名为的任意类型的通道mychannel:
Channels provide the means of communicating between threads. A channel is created with the channel constructor. For example, the following statement creates a channel of arbitrary type named mychannel:
let val mychannel = channel()let val mychannel = channel()
通道上的两个主要操作(函数)是发送(send)和接收(recv)消息。消息的类型是从发送操作推断出来的。例如,以下函数调用发送整数值7,因此通道的类型推断为整数:
The two primary operations (functions) on channels are for sending (send) and receiving (recv) messages. The type of the message is inferred from the send operation. For example, the following function call sends the integer value 7, and therefore the type of the channel is then inferred to be integer:
send(mychannel, 7)send(mychannel, 7)
该recv函数将通道命名为其参数。其返回值是其接收的值。
The recv function names the channel as its parameter. Its return value is the value it received.
由于 CML 通信是同步的,因此只有发送者和接收者都准备好时,才会发送和接收消息。如果一个线程在某个通道上发送了一条消息,而该通道上没有其他线程准备好接收消息,则发送者将被阻止并等待另一个线程recv在该通道上执行。同样,如果某个recv线程在某个通道上执行了,但该通道上没有其他线程发送消息,则运行的线程将recv被阻止并等待该通道上的消息。
Because CML communications are synchronous, a message is both sent and received only if both the sender and the receiver are ready. If a thread sends a message on a channel and no other thread is ready to receive on that channel, the sender is blocked and waits for another thread to execute a recv on the channel. Likewise, if a recv is executed on a channel by a thread but no other thread has sent a message on that channel, the thread that ran the recv is blocked and waits for a message on that channel.
因为通道是类型,所以函数可以将它们作为参数。
Because channels are types, functions can take them as parameters.
与 Ada 的同步消息传递一样,CML 同步消息传递的一个问题是,当多个通道都收到一条消息时,要决定选择哪条消息。使用相同的解决方案:受保护的命令do-od构造在发送到不同通道的消息之间随机选择。
As was the case with Ada’s synchronous message passing, an issue with CML synchronous message passing is deciding which message to choose when more than one channel has received one. And the same solution is used: the guarded command do-od construct that chooses randomly among messages to different channels.
CML 的同步机制是事件。对这一复杂机制的解释超出了本章(和本书)的范围。
The synchronization mechanism of CML is the event. An explanation of this complicated mechanism is beyond the scope of this chapter (and this book).
F# 对并发的支持部分基于 C# 使用的相同 .NET 类,具体来说System.Threading.Thread。例如,假设我们想myConMethod在自己的线程中运行该函数。调用以下函数时,将创建线程并在新线程中开始执行该函数:
Part of the F# support for concurrency is based on the same .NET classes that are used by C#, specifically System.Threading.Thread. For example, suppose we want to run the function myConMethod in its own thread. The following function, when called, will create the thread and start the execution of the function in the new thread:
let createThread() =
let newThread = new Thread(myConMethod)
newThread.Start()
let createThread() =
let newThread = new Thread(myConMethod)
newThread.Start()
回想一下,在 C# 中,需要创建一个预定义委托的实例,ThreadStart向其构造函数发送子程序的名称,并将新的委托实例作为参数发送给构造Thread函数。在 F# 中,如果函数需要委托作为其参数,则可以发送 lambda 表达式或函数,并且编译器的行为将如同您发送委托一样。因此,在上面的代码中,函数myConMethod作为参数发送给Thread构造函数,但实际发送的是 的新实例ThreadStart(已发送到该实例)myConMethod)。
Recall that in C#, it is necessary to create an instance of a predefined delegate, ThreadStart, send its constructor the name of the subprogram, and send the new delegate instance as a parameter to the Thread constructor. In F#, if a function expects a delegate as its parameter, a lambda expression or a function can be sent and the compiler will behave as if you sent the delegate. So, in the above code, the function myConMethod is sent as the parameter to the Thread constructor, but what is actually sent is a new instance of ThreadStart (to which was sent myConMethod).
此类Thread定义了Sleep方法,使调用它的线程处于休眠状态,休眠时间为作为参数发送给它的毫秒数。
The Thread class defines the Sleep method, which puts the thread from which it is called to sleep for the number of milliseconds that is sent to it as a parameter.
共享的不可变数据不需要在访问它的线程之间进行同步。但是,如果共享数据是可变的(这在 F# 中是可能的),则需要锁定以防止多个线程试图更改共享数据而破坏共享数据。当函数对可变变量进行操作时,可以锁定该变量,以便通过该lock函数提供对对象的同步访问。此函数有两个参数,第一个参数是要更改的变量。第二个参数是更改变量的 lambda 表达式。
Shared immutable data does not require synchronization among the threads that access it. However, if the shared data is mutable, which is possible in F#, locking will be required to prevent corruption of the shared data by multiple threads attempting to change it. A mutable variable can be locked while a function operates on it to provide synchronized access to the object with the lock function. This function takes two parameters, the first of which is the variable to be changed. The second parameter is a lambda expression that changes the variable.
可变的堆分配变量的类型为ref。例如,以下声明创建了一个名为 的变量,sum其初始值为0:
A mutable heap-allocated variable is of type ref. For example, the following declaration creates such a variable named sum with the initial value of 0:
let sum = ref 0let sum = ref 0
ref可以在使用 ALGOL/Pascal/Ada 赋值运算符 的 lambda 表达式中更改类型变量。变量:=必须ref以感叹号 ( !) 作为前缀才能获取其值。以下,可变变量sum被锁定,而 lambda 表达式将 的值添加x到它:
A ref type variable can be changed in a lambda expression that uses the ALGOL/Pascal/Ada assignment operator, :=. The ref variable must be prefixed with an exclamation point (!) to get its value. In the following, the mutable variable sum is locked while the lambda expression adds the value of x to it:
lock(sum) (fun () -> sum := !sum + x)lock(sum) (fun () -> sum := !sum + x)
线程可以异步调用,就像使用C#一样,使用相同的子程序BeginInvoke和EndInvoke,以及IAsyncResult接口来方便判断异步调用线程的执行完成情况。
Threads can be called asynchronously, just as with C#, using the same subprograms, BeginInvoke and EndInvoke, as well as the IAsyncResult interface to facilitate the determination of the completion of the execution of the asynchronously called thread.
如前所述,F# 为其程序提供了 .NET 的并发通用集合。这可以在构建需要队列、堆栈或包形式的共享数据结构的多线程程序时节省大量的编程工作。
As stated previously, F# has the concurrent generic collections of .NET available to its programs. This can save a great deal of programming effort when building multithreaded programs that need a shared data structure in the form of a queue, stack, or bag.
在本节中,我们简要介绍一下语句级并发的语言设计。从语言设计的角度来看,这种设计的目的是提供一种机制,程序员可以使用它来通知编译器如何将程序映射到多处理器架构上。9
In this section, we take a brief look at language design for statement-level concurrency. From the language design point of view, the objective of such designs is to provide a mechanism that the programmer can use to inform the compiler of ways it can map the program onto a multiprocessor architecture.9
在本节中,仅讨论一种语言中用于语句级并发的语言结构集合:高性能 Fortran。
In this section, only one collection of linguistic constructs from one language for statement-level concurrency is discussed: High-Performance Fortran.
高性能 Fortran (HPF;ACM,1993b) 是 Fortran 90 的扩展集合,旨在允许程序员向编译器指定信息,以帮助其优化多处理器计算机上程序的执行。HPF 包括新的规范语句和内在或内置子程序。本节仅讨论部分 HPF 语句。
High-Performance Fortran (HPF; ACM, 1993b) is a collection of extensions to Fortran 90 that are meant to allow programmers to specify information to the compiler to help it optimize the execution of programs on multiprocessor computers. HPF includes both new specification statements and intrinsic, or built-in, subprograms. This section discusses only some of the HPF statements.
HPF 的主要规范语句用于指定处理器的数量、数据在这些处理器的内存中的分布以及数据在内存位置方面的对齐方式。HPF 规范语句在 Fortran 程序中显示为特殊注释。每个 HPF 规范语句都以前缀 开头,其中!HPF$是!Fortran 90 中用于开始注释行的字符。此前缀使它们对 Fortran 90 编译器不可见,但对 HPF 编译器来说很容易识别。
The primary specification statements of HPF are for specifying the number of processors, the distribution of data over the memories of those processors, and the alignment of data with other data in terms of memory placement. The HPF specification statements appear as special comments in a Fortran program. Each of them is introduced by the prefix !HPF$, where ! is the character used to begin lines of comments in Fortran 90. This prefix makes them invisible to Fortran 90 compilers but easy for HPF compilers to recognize.
该PROCESSORS规范具有以下形式:
The PROCESSORS specification has the following form:
!HPF$ PROCESSORS procs (n)!HPF$ PROCESSORS procs (n)
此语句用于向编译器指定此程序生成的代码可使用的处理器数量。此信息与其他规范一起使用,以告诉编译器如何将数据分配到与处理器关联的内存中。
This statement is used to specify to the compiler the number of processors that can be used by the code generated for this program. This information is used in conjunction with other specifications to tell the compiler how data are to be distributed to the memories associated with the processors.
DISTRIBUTE和规范ALIGN用于为不共享内存的机器上的编译器提供信息——即每个处理器都有自己的内存。假设一个处理器访问自己的内存比访问另一个处理器的内存更快。
The DISTRIBUTE and ALIGN specifications are used to provide information to the compiler on machines that do not share memory—that is, each processor has its own memory. The assumption is that an access by a processor to its own memory is faster than an access to the memory of another processor.
该DISTRIBUTE语句指定要分发哪些数据以及要使用的分发类型。其格式如下:
The DISTRIBUTE statement specifies what data are to be distributed and the kind of distribution that is to be used. Its form is as follows:
!HPF$ DISTRIBUTE (kind) ONTO procs :: identifier_list!HPF$ DISTRIBUTE (kind) ONTO procs :: identifier_list
在此语句中,kind 可以是BLOCK或CYCLIC。标识符列表是要分发的数组变量的名称。指定要分发的变量BLOCK被分成n 个相等的组,其中每组由连续的数组元素集合组成,这些元素均匀分布在所有处理器的内存中。例如,如果一个包含 500 个元素的数组LIST分布BLOCK在 5 个处理器上,则 的前 100 个元素LIST将存储在第一个处理器的内存中,第二个 100 个元素存储在第二个处理器的内存中,依此类推。分布CYCLIC指定数组的各个元素循环存储在处理器的内存中。例如,如果LIST分布CYCLIC在五个处理器上, 的第一个元素LIST将存储在第一个处理器的内存中,第二个元素存储在第二个处理器的内存中,依此类推。
In this statement, kind can be either BLOCK or CYCLIC. The identifier list is the names of the array variables that are to be distributed. A variable that is specified to be BLOCK distributed is divided into n equal groups, where each group consists of contiguous collections of array elements evenly distributed over the memories of all the processors. For example, if an array with 500 elements named LIST is BLOCK distributed over 5 processors, the first 100 elements of LIST will be stored in the memory of the first processor, the second 100 in the memory of the second processor, and so forth. A CYCLIC distribution specifies that individual elements of the array are cyclically stored in the memories of the processors. For example, if LIST is CYCLIC distributed, again over five processors, the first element of LIST will be stored in the memory of the first processor, the second element in the memory of the second processor, and so forth.
声明的形式ALIGN是
The form of the ALIGN statement is
ALIGN array1_element WITH array2_elementALIGN array1_element WITH array2_element
ALIGN用于将一个数组的分布与另一个数组的分布联系起来。例如,
ALIGN is used to relate the distribution of one array with that of another. For example,
ALIGN list1(index) WITH list2(index+1)ALIGN list1(index) WITH list2(index+1)
指定对于 的所有值,index的元素将与 的元素list1存储在同一个处理器的内存中。 中的两个数组引用一起出现在程序的某个语句中。 将它们放在同一个内存中(即同一个处理器)可确保对它们的引用尽可能接近。index+1list2indexALIGN
specifies that the index element of list1 is to be stored in the memory of the same processor as the index+1 element of list2, for all values of index. The two array references in an ALIGN appear together in some statement of the program. Putting them in the same memory (which means the same processor) ensures that the references to them will be as close as possible.
考虑以下示例代码段:
Consider the following example code segment:
REAL list_1 (1000), list_2 (1000)
INTEGER list_3 (500), list_4 (501)
!HPF$ PROCESSORS proc (10)
!HPF$ DISTRIBUTE (BLOCK) ONTO procs :: list_1, list_2
!HPF$ ALIGN list_3 (index) WITH list_4 (index+1)
. . .
list_1 (index) = list_2 (index)
list_3 (index) = list_4 (index+1) REAL list_1 (1000), list_2 (1000)
INTEGER list_3 (500), list_4 (501)
!HPF$ PROCESSORS proc (10)
!HPF$ DISTRIBUTE (BLOCK) ONTO procs :: list_1, list_2
!HPF$ ALIGN list_3 (index) WITH list_4 (index+1)
. . .
list_1 (index) = list_2 (index)
list_3 (index) = list_4 (index+1)
每次执行这些赋值语句时,两个引用的数组元素都会存储在同一处理器的内存中。
In each execution of these assignment statements, the two referenced array elements will be stored in the memory of the same processor.
HPF 规范语句为编译器提供信息,编译器可能会用到这些信息,也可能不会用到这些信息来优化它生成的代码。编译器实际上做什么取决于它的复杂程度和目标机器的特定架构。
The HPF specification statements provide information for the compiler that it may or may not use to optimize the code it produces. What the compiler actually does depends on its level of sophistication and the particular architecture of the target machine.
该FORALL语句指定了可并发执行的赋值语句序列。例如,
The FORALL statement specifies a sequence of assignment statements that may be executed concurrently. For example,
FORALL (index = 1:1000)
list_1(index) = list_2(index)
END FORALL
FORALL (index = 1:1000)
list_1(index) = list_2(index)
END FORALL
指定将 的元素赋值list_2给 的相应元素list_1。但是,赋值顺序受限于以下顺序:在进行任何赋值之前,必须先评估所有 1,000 个赋值的右侧。这允许并发执行所有赋值语句。除了赋值语句之外,FORALL语句还可以出现在构造体中FORALL。FORALL语句与向量机非常匹配,在向量机中,相同的指令应用于许多数据值,通常在一个或多个数组中。HPF 语句FORALL包含在 Fortran 95 和后续版本的 Fortran 中。
specifies the assignment of the elements of list_2 to the corresponding elements of list_1. However, the assignments are restricted to the following order: the right side of all 1,000 assignments must be evaluated first, before any assignments take place. This permits concurrent execution of all of the assignment statements. In addition to assignment statements, FORALL statements can appear in the body of a FORALL construct. The FORALL statement is a good match with vector machines, in which the same instruction is applied to many data values, usually in one or more arrays. The HPF FORALL statement is included in Fortran 95 and subsequent versions of Fortran.
我们只是简要地讨论了 HPF 的一小部分功能。然而,这足以让读者了解哪些类型的语言扩展对于编写可能具有大量处理器的计算机很有用。
We have briefly discussed only a small part of the capabilities of HPF. However, it should be enough to provide the reader with an idea of the kinds of language extensions that are useful for programming computers with possibly large numbers of processors.
C# 4.0(以及其他 .NET 语言)包含两种行为类似于 的方法FORALL。它们是循环控制语句,其中可以展开迭代并同时执行主体。它们是Parallel.For和Parallel.ForEach。
C# 4.0 (and the other .NET languages) include two methods that behave somewhat like FORALL. They are loop control statements in which the iterations can be unrolled and the bodies executed concurrently. These are Parallel.For and Parallel.ForEach.
并发执行可以处于指令、语句或子程序级别。当实际使用多个处理器执行并发单元时,我们使用术语“物理并发” 。如果并发单元在单个处理器上执行,我们使用术语“逻辑并发”。所有并发的底层概念模型都可以称为“逻辑并发”。
Concurrent execution can be at the instruction, statement, or subprogram level. We use the phrase physical concurrency when multiple processors are actually used to execute concurrent units. If concurrent units are executed on a single processor, we use the term logical concurrency. The underlying conceptual model of all concurrency can be referred to as logical concurrency.
大多数多处理器计算机都属于两大类:SIMD 或 MIMD。MIMD 计算机可以分布式运行。
Most multiprocessor computers fall into one of two broad categories—SIMD or MIMD. MIMD computers can be distributed.
支持子程序级并发的语言必须提供两种基本功能:对共享数据结构的互斥访问(竞争同步)和任务之间的协作(合作同步)。
Languages that support subprogram-level concurrency must provide two fundamental capabilities: mutually exclusive access to shared data structures (competition synchronization) and cooperation among tasks (cooperation synchronization).
任务可以处于五种不同状态中的任意一种:新建,就绪,运行,阻塞或死亡。
Tasks can be in any one of five different states: new, ready, running, blocked, or dead.
有时我们并不设计语言结构来支持并发,而是使用诸如 OpenMP 之类的库。
Rather than designing language constructs for supporting concurrency, sometimes libraries, such as OpenMP, are used.
语言支持并发的设计问题是如何提供竞争和合作同步,应用程序如何影响任务调度,任务如何以及何时开始和结束执行,以及如何以及何时创建任务。
The design issues for language support for concurrency are how competition and cooperation synchronization are provided, how an application can influence task scheduling, how and when tasks start and end their executions, and how and when they are created.
信号量是由整数和任务描述队列组成的数据结构。信号量可用于在并发任务之间提供竞争和合作同步。错误使用信号量很容易导致编译器、链接器或运行时系统无法检测到的错误。
A semaphore is a data structure consisting of an integer and a task description queue. Semaphores can be used to provide both competition and cooperation synchronization among concurrent tasks. It is easy to use semaphores incorrectly, resulting in errors that cannot be detected by the compiler, linker, or run-time system.
监视器是一种数据抽象,它提供了一种自然的方式来提供对任务间共享数据的互斥访问。它们受多种编程语言支持,其中包括 Ada、Java 和 C#。在具有监视器的语言中,必须使用某种形式的信号量来实现合作同步。
Monitors are data abstractions that provide a natural way of providing mutually exclusive access to data shared among tasks. They are supported by several programming languages, among them Ada, Java, and C#. Cooperation synchronization in languages with monitors must be provided with some form of semaphores.
并发消息传递模型的基本概念是任务之间互相发送消息以同步其执行。
The underlying concept of the message-passing model of concurrency is that tasks send each other messages to synchronize their execution.
Ada 提供了基于消息传递模型的复杂但有效的并发结构。Ada 的任务是重量级任务。任务通过会合机制相互通信,这是一种同步消息传递。会合是一个任务接受另一个任务发送的消息的操作。Ada 包括控制任务间会合发生的简单和复杂方法。
Ada provides complex but effective constructs, based on the message-passing model, for concurrency. Ada’s tasks are heavyweight tasks. Tasks communicate with each other through the rendezvous mechanism, which is synchronous message passing. A rendezvous is the action of a task accepting a message sent by another task. Ada includes both simple and complicated methods of controlling the occurrences of rendezvous among tasks.
Ada 95+ 包含支持并发的附加功能,主要是受保护对象。Ada 95+ 通过两种方式支持监视器:使用任务和使用受保护对象。
Ada 95+ includes additional capabilities for the support of concurrency, primarily protected objects. Ada 95+ supports monitors in two ways, with tasks and with protected objects.
Java 以一种相对简单但有效的方式支持轻量级并发单元。任何继承自Thread或实现的类Runnable都可以覆盖名为的方法run,并让该方法的代码与其他此类方法和主程序并发执行。竞争同步是通过定义访问共享数据的方法来隐式同步来指定的。小段代码也可以隐式同步。所有方法都同步的类是监视器。合作同步是使用方法、和实现的。wait该类notify还notifyAll提供Thread、、sleep和方法。yieldjoininterrupt
Java supports lightweight concurrent units in a relatively simple but effective way. Any class that either inherits from Thread or implements Runnable can override a method named run and have that method’s code executed concurrently with other such methods and with the main program. Competition synchronization is specified by defining methods that access shared data to be implicitly synchronized. Small sections of code can also be implicitly synchronized. A class whose methods are all synchronized is a monitor. Cooperation synchronization is implemented with the methods wait, notify, and notifyAll. The Thread class also provides the sleep, yield, join, and interrupt methods.
SemaphoreJava 通过其类及其acquire和方法直接支持计数信号量release。它还有一些类用于提供非阻塞原子操作,例如整数的加法、增量和减法操作。Java 还通过接口Lock和ReentrantLock类及其lock和unlock方法提供显式锁定。除了使用 进行隐式同步之外,Java 还提供、和类型变量以及引用和数组synchronized的隐式非阻塞同步。在这些情况下,提供原子 getter、setter、添加、增量和减法操作。intlongboolean
Java has direct support for counting semaphores through its Semaphore class and its acquire and release methods. It also had some classes for providing nonblocking atomic operations, such as addition, increment, and decrement operations for integers. Java also provides explicit locks with the Lock interface and ReentrantLock class and its lock and unlock methods. In addition to implicit synchronization using synchronized, Java provides implicit nonblocking synchronization of int, long, and boolean type variables, as well as references and arrays. In these cases, atomic getters, setters, add, increment, and decrement operations are provided.
C# 对并发的支持基于 Java,但略微复杂一些。任何方法都可以在线程中运行。支持参与者线程和服务器线程。所有线程都通过关联的委托进行控制。服务器线程可以同步调用,Invoke也可以异步调用带有BeginInvoke。可以向被调用的线程发送回调方法地址。 类支持三种线程同步Interlocked,提供原子增量和减量操作、Monitor类 和lock语句。
C#’s support for concurrency is based on that of Java but is slightly more sophisticated. Any method can be run in a thread. Both actor and server threads are supported. All threads are controlled through associated delegates. Server threads can be synchronously called with Invoke or asynchronously called with BeginInvoke. A callback method address can be sent to the called thread. Three kinds of thread synchronization are supported with the Interlocked class, which provides atomic increment and decrement operations, the Monitor class, and the lock statement.
所有 .NET 语言都使用堆栈、队列和袋等通用并发数据结构,其中竞争同步是隐含的。
All .NET languages have the use of the generic concurrent data structures for stacks, queues, and bags, for which competition synchronization is implicit.
Multi-LISP 稍微扩展了 Scheme,允许程序员通知实现可以并发执行的程序部分。Concurrent ML 扩展了 ML,以支持一种线程形式和这些线程之间的同步消息传递形式。此消息传递采用通道设计。F# 程序可以访问所有 .NET 支持并发的类。线程之间共享的可变数据可以同步访问。
Multi-LISP extends Scheme slightly to allow the programmer to inform the implementation about program parts that can be executed concurrently. Concurrent ML extends ML to support a form of threads and a form of synchronous message passing among those threads. This message passing is designed with channels. F# programs have access to all of the .NET support classes for concurrency. Data shared among threads that is mutable can have access synchronized.
高性能 Fortran 包含用于指定如何在连接到多个处理器的内存单元上分配数据的语句。还包括用于指定可并发执行的语句集合的语句。
High-Performance Fortran includes statements for specifying how data is to be distributed over the memory units connected to multiple processors. Also included are statements for specifying collections of statements that can be executed concurrently.
Andrews 和 Schneider (1983)、Holt 等人 (1978)和Ben-Ari (1982)对并发性的一般主题进行了详细的讨论。
The general subject of concurrency is discussed at great length in Andrews and Schneider (1983), Holt et al. (1978), and Ben-Ari (1982).
Brinch Hansen (1977)开发了监视器概念,并描述了其在 Concurrent Pascal 中的实现。
The monitor concept is developed and its implementation in Concurrent Pascal is described by Brinch Hansen (1977).
Hoare (1978)和Brinch Hansen (1978)讨论了并发单元控制的消息传递模型的早期发展。Ichbiah等人 (1979)详细讨论了 Ada 任务模型的发展。ARM (1995)详细描述了 Ada 95。ACM (1993b)描述了高性能 Fortran 。
The early development of the message-passing model of concurrent unit control is discussed by Hoare (1978) and Brinch Hansen (1978). An in-depth discussion of the development of the Ada tasking model can be found in Ichbiah et al. (1979). Ada 95 is described in detail in ARM (1995). High-Performance Fortran is described in ACM (1993b).
程序中可能的并发级别有哪三种?
What are the three possible levels of concurrency in programs?
描述 SIMD 计算机的逻辑架构。
Describe the logical architecture of an SIMD computer.
描述 MIMD 计算机的逻辑架构。
Describe the logical architecture of an MIMD computer.
SIMD 计算机最能支持哪种级别的程序并发?
What level of program concurrency is best supported by SIMD computers?
MIMD 计算机最能支持哪种级别的程序并发?
What level of program concurrency is best supported by MIMD computers?
描述矢量处理器的逻辑架构。
Describe the logical architecture of a vector processor.
物理并发和逻辑并发有什么区别?
What is the difference between physical and logical concurrency?
程序中的控制线程是什么?
What is a thread of control in a program?
协程为什么被称为准并发?
Why are coroutines called quasi-concurrent?
什么是多线程程序?
What is a multithreaded program?
研究语言对并发的支持有哪四个原因?
What are four reasons for studying language support for concurrency?
什么是重量级任务?什么是轻量级任务?
What is a heavyweight task? What is a lightweight task?
定义任务、同步、竞争和合作 同步、活跃性、竞争条件和死锁。
Define task, synchronization, competition and cooperation synchronization, liveness, race condition, and deadlock.
什么样的任务不需要任何类型的同步?
What kind of tasks do not require any kind of synchronization?
描述任务可以处于的五种不同状态。
Describe the five different states in which a task can be.
什么是任务描述符?
What is a task descriptor?
在语言支持并发的背景下,什么是守卫?
In the context of language support for concurrency, what is a guard?
任务就绪队列的用途是什么?
What is the purpose of a task-ready queue?
语言支持并发的两个主要设计问题是什么?
What are the two primary design issues for language support for concurrency?
描述信号量的等待和释放操作的动作。
Describe the actions of the wait and release operations for semaphores.
什么是二进制信号量?什么是计数信号量?
What is a binary semaphore? What is a counting semaphore?
使用信号量提供同步的主要问题是什么?
What are the primary problems with using semaphores to provide synchronization?
监视器与信号量相比有哪些优势?
What advantage do monitors have over semaphores?
监视器可以用哪三种常用语言实现?
In what three common languages can monitors be implemented?
定义会合点、accept 子句、entry 子句、参与者任务、服务器任务、扩展 accept 子句、打开 accept 子句、关闭 accept 子句和完成的任务。
Define rendezvous, accept clause, entry clause, actor task, server task, extended accept clause, open accept clause, closed accept clause, and completed task.
通过监视器实现的并发和通过消息传递实现的并发,哪一个更通用?
Which is more general, concurrency through monitors or concurrency through message passing?
Ada 任务是静态创建的还是动态创建的?
Are Ada tasks created statically or dynamically?
扩展条款有何用途accept?
What purpose does an extended accept clause serve?
如何为 Ada 任务提供合作同步?
How is cooperation synchronization provided for Ada tasks?
与提供对共享数据对象的访问的任务相比,Ada 95 中的受保护对象有何优势?
What is the advantage of protected objects in Ada 95 over tasks for providing access to shared data objects?
具体来说,哪些Java程序单元可以与应用程序中的main方法并发运行?
Specifically, what Java program unit can run concurrently with the main method in an application program?
Java 线程是轻量级任务还是重量级任务?
Are Java threads lightweight or heavyweight tasks?
Javasleep方法起什么作用?
What does the Java sleep method do?
Javayield方法起什么作用?
What does the Java yield method do?
Javajoin方法起什么作用?
What does the Java join method do?
Javainterrupt方法起什么作用?
What does the Java interrupt method do?
可以声明为哪两个 Java 构造synchronized?
What are the two Java constructs that can be declared to be synchronized?
Java 中如何设置线程的优先级?
How can the priority of a thread be set in Java?
Java 线程可以是执行线程、服务器线程,或者两者兼而有之吗?
Can Java threads be actor threads, server threads, or either?
描述用于支持合作同步的三种Java方法的动作。
Describe the actions of the three Java methods that are used to support cooperation synchronization.
监视器是什么样的 Java 对象?
What kind of Java object is a monitor?
解释为什么 Java 包含Runnable接口。
Explain why Java includes the Runnable interface.
Java 对象使用的两种方法是什么Semaphore?
What are the two methods used with Java Semaphore objects?
Java 中非阻塞同步有什么优点?
What is the advantage of the nonblocking synchronization in Java?
Java类的方法有哪些AtomicInteger,该类的用途是什么?
What are the methods of the Java AtomicInteger class and what is the purpose of this class?
Java 如何支持显式锁?
How are explicit locks supported in Java?
哪些方法可以在 C# 线程中运行?
What kinds of methods can be run in a C# thread?
C# 线程可以是参与者线程、服务器线程,或者两者兼而有之吗?
Can C# threads be actor threads, server threads, or either?
C# 线程可以同步调用哪两种方式?
What are the two ways a C# thread can be called synchronously?
C# 线程如何异步调用?
How can a C# thread be called asynchronously?
在 C# 中如何检索异步调用线程的返回值?
How is the returned value from an asynchronously called thread retrieved in C#?
Sleep相对于 Java 的方法,C# 的方法有何不同sleep?
What is different about C#’s Sleep method, relative to Java’s sleep?
C# 的Abort方法到底起什么作用?
What exactly does C#’s Abort method do?
C# 的类的用途是什么Interlocked?
What is the purpose of C#’s Interlocked class?
C#lock语句起什么作用?
What does the C# lock statement do?
Multi-LISP 基于什么语言?
On what language is Multi-LISP based?
Multi-LISP 构造的语义是什么pcall?
What is the semantics of Multi-LISP’s pcall construct?
如何在 CML 中创建线程?
How is a thread created in CML?
F# 堆分配的可变变量的类型是什么?
What is the type of an F# heap-allocated mutable variable?
为什么 F# 不可变变量在多线程程序中不需要同步访问?
Why don’t F# immutable variables require synchronized access in a multithreaded program?
HPF 规范声明的目标是什么?
What is the objective of the specification statements of HPF?
FORALLHPF和Fortran的语句的用途是什么?
What is the purpose of the FORALL statement of HPF and Fortran?
解释清楚为什么在支持协程但不支持并发的编程环境中,竞争同步不是问题。
Explain clearly why competition synchronization is not a problem in a programming environment that supports coroutines but not concurrency.
当检测到死锁时系统可以采取的最佳措施是什么?
What is the best action a system can take when deadlock is detected?
忙等待是一种任务通过不断检查事件是否发生来等待给定事件的方法。这种方法的主要问题是什么?
Busy waiting is a method whereby a task waits for a given event by continuously checking for that event to occur. What is the main problem with this approach?
In the producer-consumer example of Section 13.3, suppose that we incorrectly replaced the release(access) in the consumer process with wait(access). What would be the result of this error on execution of the system?
从一本关于使用 Intel Pentium 处理器的计算机的汇编语言编程的书中,确定提供了哪些指令来支持信号量的构建。
From a book on assembly language programming for a computer that uses an Intel Pentium processor, determine what instructions are provided to support the construction of semaphores.
假设两个任务A和B必须使用共享变量Buf_Size。任务A将加2 Buf_Size,任务B将减1。假设这些算术运算通过获取当前值、执行算术运算和放回新值的三步过程完成。在没有竞争同步的情况下,可能发生哪些事件序列,这些操作会产生哪些值?假设的初始值为Buf_Size6。
Suppose two tasks, A and B, must use the shared variable Buf_Size. Task A adds 2 to Buf_Size, and task B subtracts 1 from it. Assume that such arithmetic operations are done by the three-step process of fetching the current value, performing the arithmetic, and putting the new value back. In the absence of competition synchronization, what sequences of events are possible and what values result from these operations? Assume that the initial value of Buf_Size is 6.
比较Java与Ada的竞争同步机制。
Compare the Java competition synchronization mechanism with that of Ada.
比较Java与Ada的合作同步机制。
Compare the Java cooperation synchronization mechanism with that of Ada.
如果监视过程调用同一监视中的另一个过程会发生什么情况?
What happens if a monitor procedure calls another procedure in the same monitor?
when解释使用信号量进行合作同步和在任务中使用Ada子句的相对安全性。
Explain the relative safety of cooperation synchronization using semaphores and using Ada’s when clauses in tasks.
编写一个 Ada 任务来实现通用信号量。
Write an Ada task to implement general semaphores.
编写一个 Ada 任务来管理共享缓冲区(例如我们示例中的缓冲区),但使用编程练习 1 中的信号量任务。
Write an Ada task to manage a shared buffer such as the one in our example, but use the semaphore task from Programming Exercise 1.
在 Ada 中定义信号量并使用它们在共享缓冲区示例中提供合作和竞争同步。
Define semaphores in Ada and use them to provide both cooperation and competition synchronization in the shared-buffer example.
使用 Java 编写编程练习 3。
Write Programming Exercise 3 using Java.
用 C# 编写本章的共享缓冲区示例。
Write the shared-buffer example of the chapter in C#.
读写器问题可以表述如下:共享内存位置可由任意数量的任务同时读取,但当某个任务必须写入共享内存位置时,它必须具有独占访问权限。为读写器问题编写一个 Java 程序。
The reader-writer problem can be stated as follows: A shared memory location can be concurrently read by any number of tasks, but when a task must write to the shared memory location, it must have exclusive access. Write a Java program for the reader-writer problem.
使用 Ada 编写编程练习 6。
Write Programming Exercise 6 using Ada.
使用 C# 编写编程练习 6。
Write Programming Exercise 6 using C#.
本章讨论了编程语言对许多当代程序的两个相关部分的支持:异常处理和事件处理。异常和事件的发生时间都无法预先确定,并且最好使用特殊的语言结构和流程来处理。其中一些结构和流程(例如传播)对于异常处理和事件处理是相似的。
This chapter discusses programming language support for two related parts of many contemporary programs: exception handling and event handling. Both exceptions and events occur at times that cannot be predetermined, and both are best handled with special language constructs and processes. Some of these constructs and processes—for example, propagation—are similar for exception handling and event handling.
我们首先介绍异常处理的基本概念,包括硬件和软件可检测的异常、异常处理程序以及异常的引发。然后,介绍并讨论异常处理的设计问题,包括将异常绑定到异常处理程序、延续和默认处理程序。本节随后介绍和评估两种编程语言(C++ 和 Java)的异常处理功能。然后简要介绍 Python 和 Ruby 中的异常处理。
We first describe the fundamental concepts of exception handling, including hardware- and software-detectable exceptions, exception handlers, and the raising of exceptions. Then, the design issues for exception handling are introduced and discussed, including the binding of exceptions to exception handlers, continuation, and default handlers. This section is followed by descriptions and evaluations of the exception-handling facilities of two programming languages: C++ and Java. Brief introductions to exception handling in Python and Ruby are then presented.
本章后半部分是关于事件处理的。我们首先介绍事件处理的基本概念。然后讨论 Java 和 C# 的事件处理方法。
The latter part of this chapter is about event handling. We first present an introduction to the basic concepts of event handling. This is followed by discussions of the event-handling approaches of Java and C#.
大多数计算机硬件系统都能够检测某些运行时错误情况,例如浮点溢出。早期的编程语言的设计和实现方式使得用户程序既无法检测也无法尝试处理此类错误。在这些语言中,发生此类错误只会导致程序终止并将控制权转移到操作系统。操作系统对运行时错误的典型反应是显示诊断消息,该消息可能很有意义,因此很有用,也可能非常隐晦。显示消息后,程序终止。
Most computer hardware systems are capable of detecting certain run-time error conditions, such as floating-point overflow. Early programming languages were designed and implemented in such a way that the user program could neither detect nor attempt to deal with such errors. In these languages, the occurrence of such an error simply causes the program to be terminated and control to be transferred to the operating system. The typical operating system reaction to a run-time error is to display a diagnostic message, which may be meaningful and therefore useful, or highly cryptic. After displaying the message, the program is terminated.
然而,在输入和输出操作的情况下,情况有些不同。例如,FortranRead语句可以拦截输入错误和文件结束条件,这两者都是由输入设备硬件检测到的。在这两种情况下,语句Read都可以指定用户程序中处理该条件的某个语句的标签。在文件结束的情况下,该条件显然并不总是被视为错误。在大多数情况下,它只不过是一种处理完成并且必须开始另一种处理的信号。尽管文件结束和始终是错误的事件(例如失败的输入过程)之间存在明显差异,但 Fortran 使用相同的机制处理这两种情况。请考虑以下 Fortran 语句Read:
In the case of input and output operations, however, the situation is somewhat different. For example, a Fortran Read statement can intercept input errors and end-of-file conditions, both of which are detected by the input device hardware. In both cases, the Read statement can specify the label of some statement in the user program that deals with the condition. In the case of the end-of-file, the condition obviously is not always considered an error. In most cases, it is nothing more than a signal that one kind of processing is completed and another kind must begin. In spite of the obvious difference between end-of-file and events that are always errors, such as a failed input process, Fortran handles both situations with the same mechanism. Consider the following Fortran Read statement:
Read(Unit=5, Fmt=1000, Err=100, End=999) WeightRead(Unit=5, Fmt=1000, Err=100, End=999) Weight
该Err子句指定,100如果读取操作中发生错误,则将控制权转移到标记为的语句。该End子句指定,如果读取操作遇到文件末尾,则将控制权转移到标记为的语句999。因此,Fortran 对输入错误和文件末尾都使用简单分支。
The Err clause specifies that control is to be transferred to the statement labeled 100 if an error occurs in the read operation. The End clause specifies that control is to be transferred to the statement labeled 999 if the read operation encounters the end of the file. So, Fortran uses simple branches for both input errors and end-of-file.
有一类严重错误无法被硬件检测到,但可以被编译器生成的代码检测到。例如,数组下标范围错误几乎从未被硬件检测到,1但它们会导致严重错误,而这些错误往往直到程序执行后期才会被注意到。
There is a category of serious errors that are not detectable by hardware but can be detected by code generated by the compiler. For example, array subscript range errors are almost never detected by hardware,1 but they lead to serious errors that often are not noticed until later in the program execution.
语言设计有时需要检测下标范围错误。例如,Java 语言规范要求 Java 编译器生成代码来检查每个下标表达式的正确性(如果在编译时可以确定下标表达式不能具有超出范围的值,例如,如果下标是文字,则它们不会生成此类代码)。在 C 中,不检查下标范围,因为这种检查的成本被认为(现在仍然)不值得检测此类错误。在某些语言的某些编译器中,可以根据程序或执行编译器的命令的需要选择(如果默认情况下未打开)或关闭(如果默认情况下打开)下标范围检查。
Detection of subscript range errors is sometimes required by the language design. For example, the Java language specification requires Java compilers to generate code to check the correctness of every subscript expression (they do not generate such code when it can be determined at compile time that a subscript expression cannot have an out-of-range value, for example, if the subscript is a literal). In C, subscript ranges are not checked because the cost of such checking was (and still is) not believed to be worth the benefit of detecting such errors. In some compilers for some languages, subscript range checking can be selected (if not turned on by default) or turned off (if it is on by default) as desired in the program or in the command that executes the compiler.
大多数现代语言的设计者都提供了允许程序以标准方式对某些运行时错误以及程序检测到的其他异常事件做出反应的机制。当硬件或系统软件检测到某些事件时,程序也可能收到通知,以便它们也能对这些事件做出反应。
The designers of most contemporary languages have included mechanisms that allow programs to react in a standard way to certain run-time errors, as well as other program-detected unusual events. Programs may also be notified when certain events are detected by hardware or system software, so that they also can react to these events.
我们将硬件检测到的错误(例如磁盘读取错误)和异常情况(例如文件结尾(硬件也能检测到))都视为异常。我们进一步扩展了异常的概念,将软件可检测到的错误或异常情况(由软件解释器或用户代码本身检测到)也包括在内。因此,我们将异常定义为任何异常事件(无论是否有错误),硬件或软件均可检测到,并且可能需要特殊处理。
We consider both the errors detected by hardware, such as disk read errors, and unusual conditions, such as end-of-file (which is also detected by hardware), to be exceptions. We further extend the concept of an exception to include errors or unusual conditions that are software-detectable (by either a software interpreter or the user code itself). Accordingly, we define exception to be any unusual event, erroneous or not, that is detectable by either hardware or software and that may require special processing.
检测到异常时可能需要的特殊处理称为异常处理。此处理由称为异常处理程序的代码单元或段完成。当与异常相关的事件发生时,就会引发异常。在某些基于 C 的语言中,异常被称为抛出,而不是引发。2不同类型的异常需要不同的异常处理程序。检测文件结束几乎总是需要某些特定的程序操作。但显然,该操作不适用于数组索引范围错误异常。在某些情况下,唯一的操作是生成错误消息并有序终止程序。
The special processing that may be required when an exception is detected is called exception handling. This processing is done by a code unit or segment called an exception handler. An exception is raised when its associated event occurs. In some C-based languages, exceptions are said to be thrown, rather than raised.2 Different kinds of exceptions require different exception handlers. Detection of end-of-file nearly always requires some specific program action. But, clearly, that action would not also be appropriate for an array index range error exception. In some cases, the only action is the generation of an error message and an orderly termination of the program.
语言中缺少单独或特定的异常处理功能并不妨碍处理用户定义的、软件检测到的异常。在程序单元中检测到的此类异常通常由单元的调用者处理。一种可能的设计是发送一个辅助参数,该参数用作状态变量。根据状态变量计算结果的正确性和/或正常性,在被调用的子程序中为状态变量分配一个值。从被调用单元返回后,调用者会立即测试状态变量。如果该值指示发生了异常,则可以启动可能驻留在调用单元中的处理程序。许多 C 标准库函数都使用这种方法的变体:返回值用作错误指示器。
The absence of separate or specific exception-handling facilities in a language does not preclude the handling of user-defined, software-detected exceptions. Such an exception detected within a program unit is often handled by the unit’s caller. One possible design is to send an auxiliary parameter, which is used as a status variable. The status variable is assigned a value in the called subprogram according to the correctness and/or normalness of the results of its computations. Immediately upon return from the called unit, the caller tests the status variable. If the value indicates that an exception has occurred, the handler, which may reside in the calling unit, can be enacted. Many of the C standard library functions use a variant of this approach: The return values are used as error indicators.
另一种可能性是将标签参数传递给子程序。当然,这种方法只有在允许使用标签作为参数的语言中才可行。传递标签允许被调用单元在发生异常时返回到调用方中的不同点。与第一种选择一样,处理程序通常是调用单元代码的一段。这是 Fortran 中标签参数的常见用法。
Another possibility is to pass a label parameter to the subprogram. Of course, this approach is possible only in languages that allow labels to be used as parameters. Passing a label allows the called unit to return to a different point in the caller if an exception has occurred. As in the first alternative, the handler is often a segment of the calling unit’s code. This is a common use of label parameters in Fortran.
第三种可能性是将处理程序定义为单独的子程序,其名称作为参数传递给被调用单元。在这种情况下,处理程序子程序由调用者提供,但被调用单元在发生异常时调用处理程序。这种方法的一个问题是,每次调用每个以处理程序子程序为参数的子程序时,都需要发送一个处理程序子程序,无论是否需要。此外,为了处理几种不同类型的异常,需要传递几个不同的处理程序例程,这会使代码复杂化。
A third possibility is to have the handler defined as a separate subprogram whose name is passed as a parameter to the called unit. In this case, the handler subprogram is provided by the caller, but the called unit calls the handler when an exception is raised. One problem with this approach is that one is required to send a handler subprogram with every call to every subprogram that takes a handler subprogram as a parameter, whether it is needed or not. Furthermore, to deal with several different kinds of exceptions, several different handler routines would need to be passed, complicating the code.
如果希望在检测到异常的单元中处理异常,则处理程序将作为一段代码包含在该单元中。
If it is desirable to handle an exception in the unit in which it is detected, the handler is included as a segment of code in that unit.
在语言中内置异常处理确实有一些优势。首先,如果没有异常处理,检测错误条件所需的代码会大大扰乱程序。例如,假设一个子程序包含一些表达式,这些表达式包含对名为 的矩阵元素的 10 个引用mat,其中任何一个都可能出现索引超出范围错误。进一步假设该语言不需要索引范围检查。如果没有内置的索引范围检查,则在这些操作中的每一个之前可能都需要加上代码来检测可能的索引范围错误。例如,考虑以下对 的元素的引用mat,该元素有 10 行和 20 列:
There are some definite advantages to having exception handling built into a language. First, without exception handling, the code required to detect error conditions can considerably clutter a program. For example, suppose a subprogram includes expressions that contain 10 references to elements of a matrix named mat, and any one of them could have an index out-of-range error. Further suppose that the language does not require index range checking. Without built-in index range checking, every one of these operations may need to be preceded by code to detect a possible index range error. For example, consider the following reference to an element of mat, which has 10 rows and 20 columns:
if (row >= 0 && row < 10 && col >= 0 && col < 20)
sum += mat[row][col];
else
System.out.println("Index range error on mat, row = " +
row + " col = " + col);
if (row >= 0 && row < 10 && col >= 0 && col < 20)
sum += mat[row][col];
else
System.out.println("Index range error on mat, row = " +
row + " col = " + col);
语言中异常处理的存在将允许编译器在每次访问数组元素之前插入机器代码进行此类检查,从而大大缩短和简化源程序。
The presence of exception handling in the language would permit the compiler to insert machine code for such checks before every array element access, greatly shortening and simplifying the source program.
语言支持异常处理的另一个优势来自异常传播。异常传播允许在一个程序单元中引发的异常在其动态或静态祖先中的其他单元中处理。这允许单个异常处理程序用于任意数量的不同程序单元。这种重用可以大大节省开发成本、程序大小和程序复杂性。
Another advantage of language support for exception handling results from exception propagation. Exception propagation allows an exception raised in one program unit to be handled in some other unit in its dynamic or static ancestry. This allows a single exception handler to be used for any number of different program units. This reuse can result in significant savings in development cost, program size, and program complexity.
支持异常处理的语言鼓励其用户考虑程序执行期间可能发生的所有事件及其处理方式。这种方法比不考虑这些可能性并仅仅希望不会出错要好得多。
A language that supports exception handling encourages its users to consider all of the events that could occur during program execution and how they can be handled. This approach is far better than not considering such possibilities and simply hoping nothing will go wrong.
最后,有些程序可以通过异常处理来简化处理非错误但不寻常的情况,而如果没有异常处理,程序结构可能会变得过于复杂。
Finally, there are programs in which dealing with nonerroneous but unusual situations can be simplified with exception handling, and in which program structure can become overly convoluted without it.
现在,我们来探讨一下异常处理系统作为编程语言的一部分时的一些设计问题。这样的系统可能允许预定义和用户定义的异常和异常处理程序。请注意,预定义异常是隐式引发的,而用户定义的异常必须由用户代码显式引发。考虑以下骨架子程序,其中包括针对隐式引发的异常的异常处理机制:
We now explore some of the design issues for an exception-handling system when it is part of a programming language. Such a system might allow both predefined and user-defined exceptions and exception handlers. Note that predefined exceptions are implicitly raised, whereas user-defined exceptions must be explicitly raised by user code. Consider the following skeletal subprogram that includes an exception-handling mechanism for an implicitly raised exception:
void example() {
. . .
average = sum / total;
. . .
return;
/* Exception handlers */
when zero_divide {
average = 0;
printf("Error–divisor (total) is zero\n");
}
. . .
}
void example() {
. . .
average = sum / total;
. . .
return;
/* Exception handlers */
when zero_divide {
average = 0;
printf("Error–divisor (total) is zero\n");
}
. . .
}
隐式引发的除以零的异常导致控制权转移到适当的处理程序,然后执行该处理程序。
The exception of division by zero, which is implicitly raised, causes control to transfer to the appropriate handler, which is then executed.
异常处理的第一个设计问题是如何将异常发生绑定到异常处理程序。这个问题发生在两个不同的层次上。在单元级别,存在一个问题,即如何将单元中不同点引发的相同异常绑定到单元内的不同处理程序。例如,在示例子程序中,有一个除零异常的处理程序,它似乎是为处理特定语句(所示语句)中除零的发生而编写的。但假设该函数包含几个其他带有除法的表达式运算符。对于这些运算符,此处理程序可能不合适。因此,应该可以将特定语句引发的异常绑定到特定处理程序,即使许多不同的语句可以引发相同的异常。
The first design issue for exception handling is how an exception occurrence is bound to an exception handler. This issue occurs on two different levels. On the unit level, there is the question of how the same exception being raised at different points in a unit can be bound to different handlers within the unit. For example, in the example subprogram, there is a handler for a division-by-zero exception that appears to be written to deal with an occurrence of division by zero in a particular statement (the one shown). But suppose the function includes several other expressions with division operators. For those operators, this handler would probably not be appropriate. So, it should be possible to bind the exceptions that can be raised by particular statements to particular handlers, even though the same exception can be raised by many different statements.
在更高层次上,当引发异常的单元没有本地异常处理程序时,就会出现绑定问题。在这种情况下,语言设计者必须决定是否将异常传播到其他单元,如果是,传播到哪里。这种传播方式和传播范围对异常处理程序的可写性有重要影响。例如,如果处理程序必须是本地的,则必须编写许多处理程序,这会使程序的编写和读取都变得复杂。另一方面,如果传播了异常,单个处理程序可能会处理在多个程序单元中引发的相同异常,这可能要求处理程序比人们所希望的更通用。
At a higher level, the binding question arises when there is no exception handler local to the unit in which the exception is raised. In this case, the language designer must decide whether to propagate the exception to some other unit and, if so, where. How this propagation takes place and how far it goes have an important impact on the writability of exception handlers. For example, if handlers must be local, then many handlers must be written, which complicates both the writing and reading of the program. On the other hand, if exceptions are propagated, a single handler might handle the same exception raised in several program units, which may require the handler to be more general than one would prefer.
与将异常绑定到异常处理程序有关的一个问题是,有关该异常的信息是否可供处理程序使用。
An issue that is related to the binding of an exception to an exception handler is whether information about the exception is made available to the handler.
PL/I(ANSI,1976)率先提出了允许用户程序直接参与异常处理的概念。该语言允许用户为大量语言定义的异常编写异常处理程序。此外,PL/I 引入了用户定义异常的概念,允许程序创建软件检测到的异常。这些异常使用与内置异常相同的机制。
PL/I (ANSI, 1976) pioneered the concept of allowing user programs to be directly involved in exception handling. The language allowed the user to write exception handlers for a long list of language-defined exceptions. Furthermore, PL/I introduced the concept of user-defined exceptions, which allow programs to create software-detected exceptions. These exceptions use the same mechanisms that are used for the built-in exceptions.
自从 PL/I 设计出来以来,人们已经做了大量的工作来设计异常处理的替代方法,并且异常处理机制已经被纳入了一系列后续编程语言中。
Since PL/I was designed, a substantial amount of work has been done to design alternative methods of exception handling, and exception-handling mechanisms have been included in a long list of subsequent programming languages.
异常处理程序执行后,控制权可以转移到程序中处理程序代码之外的某个地方,或者程序执行可以简单地终止。我们将此称为处理程序执行后的控制延续问题,或简称为延续。终止显然是最简单的选择,在许多错误异常情况下,也是最好的选择。但是,在其他情况下,特别是与不寻常但不是错误的事件相关的情况下,继续执行的选择是最好的。这种设计称为恢复。在这些情况下,必须选择一些约定来确定执行应该在何处继续。它可能是引发异常的语句、引发异常的语句之后的语句或其他单元。返回到引发异常的语句的选择似乎是一个好主意,但在错误异常的情况下,只有当处理程序能够以某种方式修改导致引发异常的值或操作时,它才有用。否则,异常只会被重新引发。错误异常所需的修改通常很难预测。然而,即使可能,这也可能不是一种好的做法。它允许程序消除问题的症状而不消除原因。
After an exception handler executes, either control can transfer to somewhere in the program outside of the handler code or program execution can simply terminate. We term this the question of control continuation after handler execution, or simply continuation. Termination is obviously the simplest choice, and in many error exception conditions, the best. However, in other situations, particularly those associated with unusual but not erroneous events, the choice of continuing execution is best. This design is called resumption. In these cases, some conventions must be chosen to determine where execution should continue. It might be the statement that raised the exception, the statement after the statement that raised the exception, or possibly some other unit. The choice to return to the statement that raised the exception may seem like a good one, but in the case of an error exception, it is useful only if the handler somehow is able to modify the values or operations that caused the exception to be raised. Otherwise, the exception will simply be reraised. The required modification for an error exception is often very difficult to predict. Even when possible, however, it may not be a sound practice. It allows the program to remove the symptom of a problem without removing the cause.
图 14.1说明了异常与处理程序的绑定和延续这两个问题。
The two issues of binding of exceptions to handlers and continuation are illustrated in Figure 14.1.
当包含异常处理时,子程序的执行可以以两种方式终止:执行完成或遇到异常。3在某些情况下,需要完成一些计算无论子程序如何终止执行。指定此类计算的能力称为finalization。是否支持 finalization 的选择显然是异常处理的设计问题。
When exception handling is included, a subprogram’s execution can terminate in two ways: when its execution is complete or when it encounters an exception.3 In some situations, it is necessary to complete some computation regardless of how subprogram execution terminates. The ability to specify such a computation is called finalization. The choice of whether to support finalization is obviously a design issue for exception handling.
另一个设计问题是:如果允许用户定义异常,那么这些异常该如何指定?通常的答案是要求在可以引发异常的程序单元的规范部分中声明它们。声明的异常的范围通常是包含声明的程序单元的范围。
Another design issue is the following: If users are allowed to define exceptions, how are these exceptions specified? The usual answer is to require that they be declared in the specification parts of the program units in which they can be raised. The scope of a declared exception is usually the scope of the program unit that contains the declaration.
如果语言提供了预定义异常,则会出现其他几个设计问题。例如,语言运行时系统是否应该为内置异常提供默认处理程序,还是应该要求用户为所有异常编写处理程序?另一个问题是用户程序是否可以明确引发预定义异常。如果存在软件可检测到的情况,用户希望使用预定义处理程序,则这种用法会很方便。
In the case where a language provides predefined exceptions, several other design issues follow. For example, should the language run-time system provide default handlers for the built-in exceptions, or should the user be required to write handlers for all exceptions? Another question is whether predefined exceptions can be raised explicitly by the user program. This usage can be convenient if there are software-detectable situations in which the user would like to use a predefined handler.
另一个问题是用户程序是否可以处理硬件可检测的错误。如果不能,显然所有异常都是软件可检测的。一个相关的问题是是否应该有任何预定义异常。预定义异常由硬件或系统软件隐式引发。
Another issue is whether hardware-detectable errors can be handled by user programs. If not, all exceptions obviously are software detectable. A related question is whether there should be any predefined exceptions. Predefined exceptions are implicitly raised by either hardware or system software.
异常处理设计问题可以总结如下:
The exception-handling design issues can be summarized as follows:
如何以及在何处指定异常处理程序,以及它们的范围是什么?
How and where are exception handlers specified, and what is their scope?
异常发生如何与异常处理程序绑定?
How is an exception occurrence bound to an exception handler?
关于异常的信息可以传递给处理程序吗?
Can information about an exception be passed to the handler?
当异常处理程序完成执行后,执行会在哪里继续?(这是继续还是恢复的问题。)
Where does execution continue, if at all, after an exception handler completes its execution? (This is the question of continuation or resumption.)
是否提供了某种形式的最终确定?
Is some form of finalization provided?
用户定义的异常是如何指定的?
How are user-defined exceptions specified?
如果有预定义的异常,那么对于没有提供自己的异常处理程序的程序是否应该有默认的异常处理程序?
If there are predefined exceptions, should there be default exception handlers for programs that do not provide their own?
可以明确引发预定义异常吗?
Can predefined exceptions be explicitly raised?
硬件可检测到的错误是否被视为可以处理的异常?
Are hardware-detectable errors treated as exceptions that may be handled?
是否存在任何预定义的异常?
Are there any predefined exceptions?
现在我们可以研究几种当代编程语言的异常处理功能。
We are now in a position to examine the exception-handling facilities of several contemporary programming languages.
C++ 的异常处理于 1990 年被 ANSI C++ 标准化委员会接受,随后进入 C++ 实现。该设计部分基于 CLU、Ada 和 ML 的异常处理。
The exception handling of C++ was accepted by the ANSI C++ standardization committee in 1990 and subsequently found its way into C++ implementations. The design is based in part on the exception handling of CLU, Ada, and ML.
C++ 使用由保留字引入的特殊构造try来指定异常处理程序的范围。构造包括一个名为try 子句try的复合语句和异常处理程序列表。复合语句定义以下处理程序的范围。此构造的一般形式如下:
C++ uses a special construct that is introduced with the reserved word try to specify the scope for exception handlers. A try construct includes a compound statement called the try clause and a list of exception handlers. The compound statement defines the scope of the following handlers. The general form of this construct is as follows:
try {
//** Code that might raise an exception
} catch (formal parameter) {
//** A handler body
}
. . .
catch(formal parameter) {
//** A handler body
}
try {
//** Code that might raise an exception
} catch (formal parameter) {
//** A handler body
}
. . .
catch(formal parameter) {
//** A handler body
}
每个catch函数都是一个异常处理程序。一个catch函数只能有一个形式参数,它类似于 C++ 中函数定义中的形式参数,但也可以是一个省略号 ( ...)。带有省略号形式参数的处理程序是包罗万象的处理程序;如果找不到合适的处理程序,它将针对任何引发的异常执行。形式参数也可以是裸类型说明符,例如float,就像在函数原型中一样。在这种情况下,形式参数的唯一目的是使处理程序可唯一标识。当要将有关异常的信息传递给处理程序时,形式参数包括用于此目的的变量名。由于参数的类可以是任何用户定义的类,因此参数可以包括任意数量的数据成员。将异常绑定到处理程序将在第14.3.2节 中讨论。
Each catch function is an exception handler. A catch function can have only a single formal parameter, which is similar to a formal parameter in a function definition in C++, including the possibility of it being an ellipsis (...). A handler with an ellipsis formal parameter is the catch-all handler; it is enacted for any raised exception if no appropriate handler was found. The formal parameter also can be a naked type specifier, such as float, as in a function prototype. In such a case, the only purpose of the formal parameter is to make the handler uniquely identifiable. When information about the exception is to be passed to the handler, the formal parameter includes a variable name that is used for that purpose. Because the class of the parameter can be any user-defined class, the parameter can include as many data members as are necessary. Binding exceptions to handlers are discussed in Section 14.3.2.
在 C++ 中,异常处理程序可以包含任何 C++ 代码。
In C++, exception handlers can include any C++ code.
C++ 异常仅由显式语句引发throw,其在 EBNF 中的一般形式如下:
C++ exceptions are raised only by the explicit statement throw, whose general form in EBNF is as follows:
throw [expression];throw [expression];
这里的括号是元符号,用于指定表达式是可选的。throw不带操作数的 只能出现在处理程序中。当它出现在那里时,它会重新引发异常,然后在其他地方处理。
The brackets here are metasymbols used to specify that the expression is optional. A throw without an operand can appear only in a handler. When it appears there, it reraises the exception, which is then handled elsewhere.
表达式的类型throw选择特定的处理程序,当然,该处理程序必须具有“匹配”类型的形式参数。在这种情况下,匹配意味着:具有类型为T、const T、T&(对类型为 的对象引用T)的形式参数的处理程序,或与具有类型为 的表达式的const T&处理程序匹配。在 为类的情况下,参数为 类型或任何祖先为 的类的处理程序将匹配。表达式与形式参数匹配的情况更为复杂,但本文不再赘述。throwTTTTthrow
The type of the throw expression selects the particular handler, which of course must have a “matching” type formal parameter. In this case, matching means the following: A handler with a formal parameter of type T, const T, T& (a reference to an object of type T), or const T& matches a throw with an expression of type T. In the case where T is a class, a handler whose parameter is type T or any class that is an ancestor of T matches. There are more complicated situations in which a throw expression matches a formal parameter, but they are not described here.
子句中引发的异常try会导致立即终止该子句中代码的执行try。对匹配处理程序的搜索从紧跟在try子句后面的处理程序开始。匹配过程按顺序对处理程序进行,直到找到匹配项。这意味着,如果任何其他匹配项先于完全匹配的处理程序,则不会使用完全匹配的处理程序。因此,特定异常的处理程序放在列表的顶部,后面是更通用的处理程序。最后一个处理程序通常带有省略号 ( ...) 形式参数,它可以匹配任何异常。这将保证捕获所有异常。
An exception raised in a try clause causes an immediate end to the execution of the code in that try clause. The search for a matching handler begins with the handlers that immediately follow the try clause. The matching process is done sequentially on the handlers until a match is found. This means that if any other match precedes an exactly matching handler, the exactly matching handler will not be used. Therefore, handlers for specific exceptions are placed at the top of the list, followed by more generic handlers. The last handler is often one with an ellipsis (...) formal parameter, which matches any exception. This would guarantee that all exceptions are caught.
如果在子句中引发异常try,并且没有与该try子句关联的匹配处理程序,则将传播该异常。如果该try子句嵌套在另一个try子句中,则将异常传播到与外部try子句关联的处理程序。如果所有封闭try子句均未产生匹配的处理程序,则将异常传播到引发该异常的函数的调用者。如果对该函数的调用不在子句中try,则将异常传播到该函数的调用者。如果通过此传播过程在程序中未找到匹配的处理程序,则将调用默认处理程序。此处理程序将在14.2.4节 中进一步讨论。
If an exception is raised in a try clause and there is no matching handler associated with that try clause, the exception is propagated. If the try clause is nested inside another try clause, the exception is propagated to the handlers associated with the outer try clause. If none of the enclosing try clauses yields a matching handler, the exception is propagated to the caller of the function in which it was raised. If the call to the function was not in a try clause, the exception is propagated to that function’s caller. If no matching handler is found in the program through this propagation process, the default handler is called. This handler is further discussed in Section 14.2.4.
处理程序执行完毕后,控制流将转到try构造后面的第一个语句(该语句紧接着它所属的处理程序序列中的最后一个处理程序)。处理程序可以使用throw不带表达式的 重新引发异常,在这种情况下,该异常将被传播。
After a handler has completed its execution, control flows to the first statement following the try construct (the statement immediately after the last handler in the sequence of handlers of which it is an element). A handler may reraise an exception, using a throw without an expression, in which case that exception is propagated.
就第 14.1.2节 中总结的设计问题而言,C++ 的异常处理很简单。只有用户定义的异常,并且没有指定(尽管它们可能被声明为新类)。有一个默认的异常处理程序,,unexpected其唯一操作是终止程序。此处理程序捕获程序未捕获的所有异常。它可以被用户定义的处理程序替换。替换处理程序必须是一个返回void且不接受任何参数的函数。通过将其名称分配给来设置替换函数set_terminate。
In terms of the design issues summarized in Section 14.1.2, the exception handling of C++ is simple. There are only user-defined exceptions, and they are not specified (though they might be declared as new classes). There is a default exception handler, unexpected, whose only action is to terminate the program. This handler catches all exceptions not caught by the program. It can be replaced by a user-defined handler. The replacement handler must be a function that returns void and takes no parameters. The replacement function is set by assigning its name to set_terminate.
C++ 函数可以列出它可能引发的异常类型(表达式的类型throw)。这是通过将保留字 附加throw到函数头,后跟这些类型的括号列表来实现的。例如,
A C++ function can list the types of the exceptions (the types of the throw expressions) that it could raise. This is done by attaching the reserved word throw, followed by a parenthesized list of these types, to the function header. For example,
int fun() throw (int, char *) { . . . }int fun() throw (int, char *) { . . . }
指定函数fun可以引发类型int和char *的异常,但不能引发其他异常。该子句的目的throw是通知函数的用户该函数可能引发哪些异常。该throw子句实际上是函数与其调用者之间的契约。它保证函数中不会引发其他异常。
specifies that the function fun could raise exceptions of type int and char * but no others. The purpose of the throw clause is to notify users of the function what exceptions might be raised by the function. The throw clause is in effect a contract between the function and its callers. It guarantees that no other exception will be raised in the function.
如果子句中的类型throw是类,则函数可以引发任何从列出的类派生的异常。如果函数头包含子句throw并引发子句中未列出throw且不是从列出的类派生的异常,则将调用默认处理程序。请注意,此错误无法在编译时检测到。列表中的类型列表可能为空,这意味着函数不会引发任何异常。如果头上没有throw指定,则函数可以引发任何异常。列表不是函数类型的一部分。
If the types in the throw clause are classes, then the function can raise any exception that is derived from the listed classes. If a function header has a throw clause and raises an exception that is not listed in the throw clause and is not derived from a class listed there, the default handler is called. Note that this error cannot be detected at compile time. The list of types in the list may be empty, meaning that the function will not raise any exceptions. If there is no throw specification on the header, the function can raise any exception. The list is not part of the function’s type.
如果一个函数重写了另一个具有子句的函数throw,则重写函数不能具有throw比被重写函数具有更多异常的子句。
If a function overrides a function that has a throw clause, the overriding function cannot have a throw clause with more exceptions than the overridden function.
尽管 C++ 没有预定义的异常,但是标准库定义并抛出异常,例如out_of_range,它可以由库容器类抛出,以及overflow_error,它可以由数学库函数抛出。
Although C++ has no predefined exceptions, the standard libraries define and throw exceptions, such as out_of_range, which can be thrown by library container classes, and overflow_error, which can be thrown by math library functions.
以下示例程序说明了 C++ 中异常处理程序的一些简单用法。该程序使用计数器数组计算并打印输入成绩的分布。输入是成绩序列,以负数结尾。负数会引发异常,因为成绩NegativeInputException必须是非负整数。成绩有 10 个类别
成绩本身用于计算计数器数组的索引,每个成绩类别一个。通过捕获计数器数组中的索引错误来检测无效的输入成绩。在计算成绩分布时,100 分是特殊的,因为除最高等级外,所有类别都有 10 个可能的成绩值,最高等级有 11 个(90、91、……、100)。(A 级的可能数量多于 B 级或 C 级,这一事实充分证明了教师的慷慨。)100 分的成绩也在用于无效输入数据的同一异常处理程序中处理。
The following example program illustrates some simple uses of exception handlers in C++. The program computes and prints a distribution of input grades by using an array of counters. The input is a sequence of grades, terminated by a negative number. The negative number raises a NegativeInputException exception because the grades must be nonnegative integers. There are 10 categories of grades
The grades themselves are used to compute indexes into an array of counters, one for each grade category. Invalid input grades are detected by trapping indexing errors in the counter array. A grade of 100 is special in the computation of the grade distribution because the categories all have 10 possible grade values, except the highest, which has 11 (90, 91, . . . , 100). (The fact that there are more possible A grades than B’s or C’s is conclusive evidence of the generosity of teachers.) The grade of 100 is also handled in the same exception handler that is used for invalid input data.
// Grade Distribution
// Input: A list of integer values that represent
// grades, followed by a negative number
// Output: A distribution of grades, as a percentage for
// each of the categories 0-9, 10-19, . . .,
// 90-100.
#include <iostream>
int main() { //* Any exception can be raised
int new_grade,
index,
limit_1,
limit_2,
freq[10] = {0,0,0,0,0,0,0,0,0,0};
// The exception definition to deal with the end of data
class NegativeInputException {
public:
NegativeInputException() { //* Constructor
cout << ”End of input data reached" << endl;
} //** end of constructor
} //** end of NegativeInputException class
try {
while (true) {
cout << ”Please input a grade" << endl;
if ((cin >> new_grade) < 0) //* End of data
throw NegativeInputException();
index = new_grade / 10;
{try {
if (index > 9)
throw new_grade;
freq[index]++;
} //* end of inner try compound
catch(int grade) { //* Handler for index errors
if (grade == 100)
freq[9]++;
else
cout << ”Error -- new grade: " << grade
<< " is out of range" << endl;
} //* end of catch(int grade)
} //* end of the block for the inner try-catch pair
} //* end of while (1)
} //* end of outer try block
catch(NegativeInputException& e) { //**Handler for
//** negative input
cout << ”Limits Frequency" << endl;
for (index = 0; index < 10; index++) {
limit_1 = 10 * index;
limit_2 = limit_1 + 9;
if (index == 9)
limit_2 = 100;
cout << limit_1 << limit_2 << freq[index] << endl;
} //* end of for (index = 0)
} //* end of catch (NegativeInputException& e)
} //* end of main
// Grade Distribution
// Input: A list of integer values that represent
// grades, followed by a negative number
// Output: A distribution of grades, as a percentage for
// each of the categories 0-9, 10-19, . . .,
// 90-100.
#include <iostream>
int main() { //* Any exception can be raised
int new_grade,
index,
limit_1,
limit_2,
freq[10] = {0,0,0,0,0,0,0,0,0,0};
// The exception definition to deal with the end of data
class NegativeInputException {
public:
NegativeInputException() { //* Constructor
cout << ”End of input data reached" << endl;
} //** end of constructor
} //** end of NegativeInputException class
try {
while (true) {
cout << ”Please input a grade" << endl;
if ((cin >> new_grade) < 0) //* End of data
throw NegativeInputException();
index = new_grade / 10;
{try {
if (index > 9)
throw new_grade;
freq[index]++;
} //* end of inner try compound
catch(int grade) { //* Handler for index errors
if (grade == 100)
freq[9]++;
else
cout << ”Error -- new grade: " << grade
<< " is out of range" << endl;
} //* end of catch(int grade)
} //* end of the block for the inner try-catch pair
} //* end of while (1)
} //* end of outer try block
catch(NegativeInputException& e) { //**Handler for
//** negative input
cout << ”Limits Frequency" << endl;
for (index = 0; index < 10; index++) {
limit_1 = 10 * index;
limit_2 = limit_1 + 9;
if (index == 9)
limit_2 = 100;
cout << limit_1 << limit_2 << freq[index] << endl;
} //* end of for (index = 0)
} //* end of catch (NegativeInputException& e)
} //* end of main
此程序旨在说明 C++ 异常处理的机制。请注意,索引范围异常在 C++ 中通常通过重载索引操作来处理,这可能会引发异常,而不是使用我们示例中使用的选择构造直接检测索引操作。
This program is meant to illustrate the mechanics of C++ exception handling. Note that the index range exception is often handled in C++ by overloading the indexing operation, which could then raise the exception, rather than the direct detection of the indexing operation with the selection construct used in our example.
C++ 中异常处理的一个缺陷是没有预定义的硬件可检测异常可供用户处理。异常通过参数类型连接到处理程序,其中可以省略形式参数。处理程序的形式参数类型决定了调用它的条件,但可能与引发的异常的性质毫无关系。因此,对异常使用预定义类型肯定不会提高可读性。最好在有意义的层次结构中为异常定义具有有意义名称的类,这些类可用于定义异常。异常参数提供了一种将有关异常的信息传递给异常处理程序的方法。
One deficiency of exception handling in C++ is that there are no predefined hardware-detectable exceptions that can be handled by the user. Exceptions are connected to handlers through a parameter type in which the formal parameter may be omitted. The type of the formal parameter of a handler determines the condition under which it is called but may have nothing whatsoever to do with the nature of the raised exception. Therefore, the use of predefined types for exceptions certainly does not promote readability. It is much better to define classes for exceptions with meaningful names in a meaningful hierarchy that can be used for defining exceptions. The exception parameter provides a way to pass information about an exception to the exception handler.
在第13章 中,Java 示例程序包含异常处理的使用,但解释得很少。本节描述了 Java 异常处理功能的细节。
In Chapter 13, the Java example program includes the use of exception handling with little explanation. This section describes the details of Java’s exception-handling capabilities.
Java 的异常处理基于 C++,但其设计更符合面向对象语言范式。此外,Java 还包含一组预定义异常,这些异常由 Java 运行时系统隐式引发。
Java’s exception handling is based on that of C++, but it is designed to be more in line with the object-oriented language paradigm. Furthermore, Java includes a collection of predefined exceptions that are implicitly raised by the Java run-time system.
所有 Java 异常都是 类的后代类的对象Throwable。Java 系统包含两个预定义的异常类,它们是 和 的子类Throwable。Error类Exception及其Error后代与 Java 运行时系统抛出的错误有关,例如堆内存不足。这些异常永远不会由用户程序抛出,也永远不会在用户程序中处理。 有两个系统定义的直接后代:Exception和RuntimeException。IOException顾名思义,IOException是在输入或输出操作中发生错误时抛出的,所有这些方法都定义为 包中定义的各个类中的方法java.io。
All Java exceptions are objects of classes that are descendants of the Throwable class. The Java system includes two predefined exception classes that are subclasses of Throwable: Error and Exception. The Error class and its descendants are related to errors that are thrown by the Java run-time system, such as running out of heap memory. These exceptions are never thrown by user programs, and they should never be handled there. There are two system-defined direct descendants of Exception: RuntimeException and IOException. As its name indicates, IOException is thrown when an error has occurred in an input or output operation, all of which are defined as methods in the various classes defined in the package java.io.
有预定义的类是 的后代RuntimeException。在大多数情况下,RuntimeException当用户程序导致错误时, 会被抛出。例如,ArrayIndexOutOfBoundsException是在 中定义的java.util,是从 衍生而来的常见抛出异常RuntimeException。另一个从 衍生而来的常见抛出异常RuntimeException是NullPointerException。
There are predefined classes that are descendants of RuntimeException. In most cases, RuntimeException is thrown when a user program causes an error. For example, ArrayIndexOutOfBoundsException, which is defined in java.util, is a commonly thrown exception that descends from RuntimeException. Another commonly thrown exception that descends from RuntimeException is NullPointerException.
用户程序可以定义自己的异常类。Java 中的惯例是,用户定义的异常是 的子类Exception。
User programs can define their own exception classes. The convention in Java is that user-defined exceptions are subclasses of Exception.
Java 的异常处理程序与 C++ 的形式相同,只是每个异常处理程序都catch必须有一个参数,并且参数的类必须是预定义类的后代Throwable。
The exception handlers of Java have the same form as those of C++, except that every catch must have a parameter and the class of the parameter must be a descendant of the predefined class Throwable.
Java 中构造的语法try与 C++ 完全相同,除了第 14.3.6节finally中描述的子句之外。
The syntax of the try construct in Java is exactly as that of C++, except for the finally clause described in Section 14.3.6.
抛出异常很简单。异常类的实例作为语句的操作数给出throw。例如,假设我们定义一个名为MyException的异常
Throwing an exception is simple. An instance of the exception class is given as the operand of the throw statement. For example, suppose we define an exception named MyException as
class MyException extends Exception {
public MyException() {}
public MyException(String message) {
super (message);
}
}
class MyException extends Exception {
public MyException() {}
public MyException(String message) {
super (message);
}
}
可以使用以下语句抛出该异常:
This exception can be thrown with the following statement:
throw new MyException();throw new MyException();
我们在新类中包含的两个构造函数中,一个没有参数,另一个有一个String对象参数,它发送给超类(Exception),后者显示它。因此,我们的新异常可以通过以下方式抛出:
One of the two constructors we have included in our new class has no parameter and the other has a String object parameter that it sends to the superclass (Exception), which displays it. Therefore, our new exception could be thrown with
throw new MyException
("a message to specify the location of the error");throw new MyException
("a message to specify the location of the error");
Java 中异常与处理程序的绑定与 C++ 类似。如果在构造的复合语句中抛出异常try,则该异常将绑定到紧跟子句catch之后的第一个处理程序(函数),try该子句的参数与抛出的对象属于同一类,或者是其祖先。如果找到匹配的处理程序,则将异常throw绑定到该处理程序并执行该处理程序。
The binding of exceptions to handlers in Java is similar to that of C++. If an exception is thrown in the compound statement of a try construct, it is bound to the first handler (catch function) immediately following the try clause whose parameter is the same class as the thrown object, or an ancestor of it. If a matching handler is found, the throw is bound to it and it is executed.
throw通过在处理程序末尾包含一个没有操作数的语句,可以处理异常,然后重新抛出。新抛出的异常不会在try最初抛出的地方处理,因此循环不是问题。这种重新抛出通常在某些本地操作有用但需要通过封闭try子句或try调用方子句进行进一步处理时进行。throw处理程序中的语句也可能抛出一些异常,而不是将控制权转移到此处理程序的异常。
Exceptions can be handled and then rethrown by including a throw statement without an operand at the end of the handler. The newly thrown exception will not be handled in the same try where it was originally thrown, so looping is not a concern. This rethrowing is usually done when some local action is useful, but further handling by an enclosing try clause or a try clause in the caller is necessary. A throw statement in a handler could also throw some exception other than the one that transferred control to this handler.
为了确保子句中抛出的异常try始终在方法中得到处理,可以编写一个特殊的处理程序,该处理程序与派生的所有异常相匹配,只需Exception使用类型参数定义处理程序Exception即可,如下所示
To ensure that exceptions that can be thrown in a try clause are always handled in a method, a special handler can be written that matches all exceptions that are derived from Exception simply by defining the handler with an Exception type parameter, as in
catch (Exception genericObject) {
. . .
}
catch (Exception genericObject) {
. . .
}
因为类名总是匹配其自身或任何祖先类,所以任何从其派生的类都会Exception匹配Exception。当然,这样的异常处理程序应始终放在处理程序列表的末尾,因为它将阻止在其出现的构造中使用任何跟在它后面的处理程序try。发生这种情况是因为对匹配处理程序的搜索是连续的,并且当找到匹配项时搜索结束。
Because a class name always matches itself or any ancestor class, any class derived from Exception matches Exception. Of course, such an exception handler should always be placed at the end of the list of handlers, for it will block the use of any handler that follows it in the try construct in which it appears. This occurs because the search for a matching handler is sequential, and the search ends when a match is found.
作为反射功能的一部分,Java 运行时系统会存储程序中每个对象的类名。该方法getClass可用于获取存储类名的对象,而类名本身可通过该getName方法获取。因此,我们可以检索实际参数的类名来自throw导致处理程序执行的语句。对于前面显示的处理程序,这是通过
As part of its reflection facilities, the Java run-time system stores the class name of every object in the program. The method getClass can be used to get an object that stores the class name, which itself can be gotten with the getName method. So, we can retrieve the name of the class of the actual parameter from the throw statement that caused the handler’s execution. For the handler shown earlier, this is done with
genericObject.getClass().getName()genericObject.getClass().getName()
此外,还可以使用以下方法获取由构造函数创建的参数对象关联的消息
In addition, the message associated with the parameter object, which is created by the constructor, can be gotten with
genericObject.getMessage()genericObject.getMessage()
此外,在用户定义异常的情况下,抛出的对象可能包含在处理程序中可能有用的任意数量的数据字段。
Furthermore, in the case of user-defined exceptions, the thrown object could include any number of data fields that might be useful in the handler.
Java子句throws的外观和位置(在程序中)与 C++ 规范类似throw。但是,其语义throws与 C++ 子句略有不同throw。
The throws clause of Java has the appearance and placement (in a program) that is similar to that of the throw specification of C++. However, the semantics of throws is somewhat different from that of the C++ throw clause.
throwsJava 方法子句中出现的异常类名指定了该方法可以抛出该异常类或其任何子代异常类,但不能处理。例如,当某个方法指定它可以抛出 时IOException,这意味着它可以抛出一个IOException对象或其任何子代类的对象(例如 )EOFException,并且它不处理它抛出的异常。
The appearance of an exception class name in the throws clause of a Java method specifies that that exception class or any of its descendant exception classes can be thrown but not handled by the method. For example, when a method specifies that it can throw IOException, it means it can throw an IOException object or an object of any of its descendant classes, such as EOFException, and it does not handle the exception it throws.
类Error和RuntimeException及其后代的异常称为未检查异常。所有其它异常称为已检查异常。未检查异常从来不是编译器关心的事情。但是,编译器会确保方法可能抛出的所有已检查异常都列在其子句中或在方法中处理。请注意,在编译时检查不同于 C++,后者是在运行时检查。类和及其后代throws的异常是未检查的,是因为任何方法都可能抛出它们。程序可以捕获未检查异常,但这不是必须的。ErrorRuntimeException
Exceptions of class Error and RuntimeException and their descendants are called unchecked exceptions. All other exceptions are called checked exceptions. Unchecked exceptions are never a concern of the compiler. However, the compiler ensures that all checked exceptions a method can throw are either listed in its throws clause or handled in the method. Note that checking this at compile time differs from C++, in which it is done at run time. The reason why exceptions of the classes Error and RuntimeException and their descendants are unchecked is that any method could throw them. A program can catch unchecked exceptions, but it is not required.
与 C++ 的情况一样,方法在其throws子句中声明的异常不能比其重写的方法多,但可以声明更少的异常。因此,如果方法没有throws子句,则重写它的任何方法也不能。方法可以抛出其子句中列出的任何异常throws,以及这些异常的任何后代类。
As is the case with C++, a method cannot declare more exceptions in its throws clause than the method it overrides, though it may declare fewer. So if a method has no throws clause, neither can any method that overrides it. A method can throw any exception listed in its throws clause, along with any of the descendant classes of those exceptions.
如果方法不直接抛出特定异常,但调用了另一个可能抛出该异常的方法,则必须在其子句中列出该异常。这就是使用该方法的方法(在下一小节的示例中)必须在其标头子句中指定的throws原因。buildDistreadLineIOExceptionthrows
A method that does not directly throw a particular exception, but calls another method that could throw that exception, must list the exception in its throws clause. This is the reason the buildDist method (in the example in the next subsection), which uses the readLine method, must specify IOException in the throws clause of its header.
不包含子句的方法throws无法传播任何已检查异常。回想一下,在 C++ 中,没有子句的函数throw可以抛出任何异常。
A method that does not include a throws clause cannot propagate any checked exception. Recall that in C++, a function without a throw clause can throw any exception.
如果某个方法调用的throws子句中列出了特定的已检查异常,则该方法有三种处理该异常的方法:首先,它可以捕获该异常并进行处理。其次,它可以捕获该异常并抛出其子句中列出的异常throws。第三,它可以在其自己的子句中声明异常throws而不是处理它,这会有效地将异常传播到封闭try子句(如果有)或方法的调用者(如果没有封闭try子句)。
A method that calls a method that lists a particular checked exception in its throws clause has three alternatives for dealing with that exception: First, it can catch the exception and handle it. Second, it can catch the exception and throw an exception that is listed in its own throws clause. Third, it could declare the exception in its own throws clause and not handle it, which effectively propagates the exception to an enclosing try clause, if there is one, or to the method’s caller, if there is no enclosing try clause.
没有默认的异常处理程序,并且无法禁用异常。Java 中的延续与 C++ 中的完全相同。
There are no default exception handlers, and it is not possible to disable exceptions. Continuation in Java is exactly as in C++.
以下是具有第 14.2.5节 中 C++ 程序功能的 Java 程序:
Following is the Java program with the capabilities of the C++ program in Section 14.2.5:
// Grade Distribution
// Input: A list of integer values that represent
// grades, followed by a negative number
// Output: A distribution of grades, as a percentage for
// each of the categories 0-9, 10-19, . . .,
// 90-100.
import java.io.*;
// The exception definition to deal with the end of data
class NegativeInputException extends Exception {
public NegativeInputException() {
System.out.println("End of input data reached");
} //** end of constructor
} //** end of NegativeInputException class
class GradeDist {
int newGrade,
index,
limit_1,
limit_2;
int [] freq = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
void buildDist() throws IOException {
DataInputStream in = new DataInputStream(System.in);
try {
while (true) {
System.out.println("Please input a grade");
newGrade = Integer.parseInt(in.readLine());
if (newGrade < 0)
throw new NegativeInputException();
index = newGrade / 10;
try {
freq[index]++;
} //** end of inner try clause
catch(ArrayIndexOutOfBoundsException e) {
if (newGrade == 100)
freq [9]++;
else
System.out.println("Error - new grade: " +
newGrade + " is out of range");
} //** end of catch (ArrayIndex. . .
} //** end of while (true) . . .
} //** end of outer try clause
catch(NegativeInputException e) {
System.out.println ("\nLimits Frequency\n");
for (index = 0; index < 10; index++) {
limit_1 = 10 * index;
limit_2 = limit_1 + 9;
if (index == 9)
limit_2 = 100;
System.out.println("" + limit_1 + " - " +
limit_2 + " " + freq [index]);
} //** end of for (index = 0; ...
} //** end of catch (NegativeInputException ...
} //** end of method buildDist
// Grade Distribution
// Input: A list of integer values that represent
// grades, followed by a negative number
// Output: A distribution of grades, as a percentage for
// each of the categories 0-9, 10-19, . . .,
// 90-100.
import java.io.*;
// The exception definition to deal with the end of data
class NegativeInputException extends Exception {
public NegativeInputException() {
System.out.println("End of input data reached");
} //** end of constructor
} //** end of NegativeInputException class
class GradeDist {
int newGrade,
index,
limit_1,
limit_2;
int [] freq = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
void buildDist() throws IOException {
DataInputStream in = new DataInputStream(System.in);
try {
while (true) {
System.out.println("Please input a grade");
newGrade = Integer.parseInt(in.readLine());
if (newGrade < 0)
throw new NegativeInputException();
index = newGrade / 10;
try {
freq[index]++;
} //** end of inner try clause
catch(ArrayIndexOutOfBoundsException e) {
if (newGrade == 100)
freq [9]++;
else
System.out.println("Error - new grade: " +
newGrade + " is out of range");
} //** end of catch (ArrayIndex. . .
} //** end of while (true) . . .
} //** end of outer try clause
catch(NegativeInputException e) {
System.out.println ("\nLimits Frequency\n");
for (index = 0; index < 10; index++) {
limit_1 = 10 * index;
limit_2 = limit_1 + 9;
if (index == 9)
limit_2 = 100;
System.out.println("" + limit_1 + " - " +
limit_2 + " " + freq [index]);
} //** end of for (index = 0; ...
} //** end of catch (NegativeInputException ...
} //** end of method buildDist
程序中定义了负输入的异常NegativeInputException。当创建类的对象时,其构造函数会显示一条消息。其处理程序会生成方法的输出。ArrayIndexOutOfBoundsException是 Java 运行时系统抛出的预定义未经检查的异常。在这两种情况下,处理程序都不在其参数中包含对象名称。在这两种情况下,名称都没有任何用处。尽管所有处理程序都获取对象作为参数,但它们通常没有用处。
The exception for a negative input, NegativeInputException, is defined in the program. Its constructor displays a message when an object of the class is created. Its handler produces the output of the method. ArrayIndexOutOfBoundsException is a predefined unchecked exception that is thrown by the Java run-time system. In both of these cases, the handler does not include an object name in its parameter. In neither case would a name serve any purpose. Although all handlers get objects as parameters, they often are not useful.
finallyfinally Clause在某些情况下,无论try子句是否抛出异常或方法中是否处理了异常,都必须执行某个进程。这种情况的一个例子是必须关闭文件。另一个例子是,如果方法具有某些外部资源,则无论方法的执行如何终止,都必须在方法中释放这些资源。子句finally就是为满足这些需求而设计的。子句finally位于处理程序列表的末尾,紧跟在完整try构造之后。通常,try构造及其finally子句显示为
There are some situations in which a process must be executed regardless of whether a try clause throws an exception or the exception is handled in the method. One example of such a situation is a file that must be closed. Another is if the method has some external resource that must be freed in the method regardless of how the execution of the method terminates. The finally clause was designed for these kinds of needs. A finally clause is placed at the end of the list of handlers just after a complete try construct. In general, the try construct and its finally clause appear as
try {
. . .
}
catch (. . .) {
. . .
}
. . . //** More handlers
finally {
. . .
}
try {
. . .
}
catch (. . .) {
. . .
}
. . . //** More handlers
finally {
. . .
}
此构造的语义如下:如果try子句未引发任何异常,则在构造finally之后继续执行之前执行该子句try。如果try子句引发异常,并且被后续的处理程序,该finally子句在处理程序完成执行后执行。如果try子句抛出异常但未被构造后的处理程序捕获try,则该finally子句在传播异常之前执行。
The semantics of this construct is as follows: If the try clause throws no exceptions, the finally clause is executed before execution continues after the try construct. If the try clause throws an exception and it is caught by a following handler, the finally clause is executed after the handler completes its execution. If the try clause throws an exception but it is not caught by a handler following the try construct, the finally clause is executed before the exception is propagated.
没有异常处理程序的构造try后面可以跟一个finally子句。当然,只有当复合语句具有throw、break、continue或return语句时,这才有意义。在这些情况下,其目的与将其用于异常处理时相同。例如,考虑以下内容:
A try construct with no exception handlers can be followed by a finally clause. This makes sense, of course, only if the compound statement has a throw, break, continue, or return statement. Its purpose in these cases is the same as when it is used with exception handling. For example, consider the following:
try {
for (index = 0; index < 100; index++) {
. . .
if (. . . ) {
return;
} //** end of if
. . .
} //** end of for
} //** end of try clause
finally {
. . .
} //** end of try construct
try {
for (index = 0; index < 100; index++) {
. . .
if (. . . ) {
return;
} //** end of if
. . .
} //** end of for
} //** end of try clause
finally {
. . .
} //** end of try construct
finally无论return循环是否终止或正常结束,此处的子句都将被执行。
The finally clause here will be executed, regardless of whether the return terminates the loop or it ends normally.
在第2章 Plankalkül 的讨论中,我们提到它包含断言。断言是在 Java 1.4 版中添加的。要使用它们,必须通过在程序中使用enableassertions(或ea)标志来启用它们,如下所示
In the discussion of Plankalkül in Chapter 2, we mentioned that it included assertions. Assertions were added to Java in version 1.4. To use them, it is necessary to enable them by running the program with the enableassertions (or ea) flag, as in
java -enableassertions MyProgramjava -enableassertions MyProgram
该语句有两种可能的形式assert:
There are two possible forms of the assert statement:
assert 条件;assert 条件:表达式;
assert condition;assert condition : expression;
在第一种情况下,当执行到达时测试条件assert。如果条件计算结果为真,则不发生任何事情。如果计算结果为假,则AssertionError抛出异常。在第二种情况下,操作相同,只是表达式的值AssertionError作为字符串传递给构造函数并成为调试输出。
In the first case, the condition is tested when execution reaches the assert. If the condition evaluates to true, nothing happens. If it evaluates to false, the AssertionError exception is thrown. In the second case, the action is the same, except that the value of the expression is passed to the AssertionError constructor as a string and becomes debugging output.
语句assert用于防御性编程。程序可能包含许多assert语句,这些语句确保程序的计算能够产生正确的结果。许多程序员在编写程序时都会加入此类检查,以帮助调试,即使他们使用的语言不支持断言。当程序经过充分测试后,这些检查将被删除。具有相同用途的语句的优点assert是,它们可以在不从程序中删除的情况下被禁用。这节省了删除它们的精力,并且还允许在后续程序维护期间使用它们。
The assert statement is used for defensive programming. A program may be written with many assert statements, which ensure that the program’s computation is on track to produce correct results. Many programmers put in such checks when they write a program, as an aid to debugging, even though the language they are using does not support assertions. When the program is sufficiently tested, these checks are removed. The advantage of assert statements, which have the same purpose, is that they can be disabled without removing them from the program. This saves the effort of removing them and also allows their use during subsequent program maintenance.
Java 的异常处理机制比其所基于的 C++ 版本有所改进。
The Java mechanisms for exception handling are an improvement over the C++ version on which they are based.
首先,C++ 程序可以抛出程序中或系统定义的任何类型。在 Java 中,只有实例对象或从其派生的类的对象才可以被抛出。这将可以抛出的对象与程序中所有其他对象(和非对象)区分开来。导致值被抛出Throwable的异常有什么意义?int
First, a C++ program can throw any type defined in the program or by the system. In Java, only objects that are instances of Throwable or some class that descends from it can be thrown. This separates the objects that can be thrown from all of the other objects (and nonobjects) that inhabit a program. What significance can be attached to an exception that causes an int value to be thrown?
其次,不包含子句的 C++ 程序单元throw可以抛出任何异常,但不会告诉读者任何信息。不包含子句的 Java 方法throws不能抛出它无法处理的任何已检查异常。因此,Java 方法的读者可以从其头部知道它可以抛出但无法处理的异常。C++ 编译器会忽略throw子句,但 Java 编译器会确保方法可以抛出的所有异常都列在其throws子句中。
Second, a C++ program unit that does not include a throw clause can throw any exception, which tells the reader nothing. A Java method that does not include a throws clause cannot throw any checked exception that it does not handle. Therefore, the reader of a Java method knows from its header what exceptions it could throw but does not handle. A C++ compiler ignores throw clauses, but a Java compiler ensures that all exceptions that a method can throw are listed in its throws clause.
第三,该finally子句是一个有用的补充。它允许进行清理类型的操作,而不管复合语句如何终止。
Third, the finally clause is a useful addition. It allows cleanup kinds of actions to take place regardless of how a compound statement terminated.
最后,Java 运行时系统会隐式抛出各种预定义异常,例如数组索引超出范围和取消引用空引用,这些异常可以由任何用户程序处理。C++ 程序只能处理它显式抛出的异常(或它使用的库类抛出的异常)。
Finally, the Java run-time system implicitly throws a variety of predefined exceptions, such as for array indices out of range and dereferencing null references, which can be handled by any user program. A C++ program can handle only those exceptions that it explicitly throws (or that are thrown by library classes it uses).
C# 包含的异常处理结构与 Java 的非常相似,只是 C# 没有子句throws。
C# includes exception-handling constructs that are very much like those of Java, except that C# does not have a throws clause.
本节简要概述了 Python 和 Ruby 的异常处理机制。
This section provides brief overviews of the exception-handling mechanisms of Python and Ruby.
在 Python 中,异常是对象。所有异常类的基类是BaseException,该类Exception派生自 。BaseException提供了一些对所有异常类都有用的服务,但它通常不直接被子类化。所有预定义异常类都派生自Exception,用户定义的异常类也派生自 。 最常用的预定义子类Exception是ArithmeticError,其主要子类是OverflowError、ZeroDivisionError和FloatingPointError,以及LookupError,其主要子类是IndexError和KeyError。
In Python, exceptions are objects. The base class of all exception classes is BaseException, from which the Exception class is derived. BaseException provides some services that are useful for all exception classes, but it is not usually directly subclassed. All predefined exception classes are derived from Exception and user-defined exception classes also are derived from it. The most commonly used predefined subclasses of Exception are ArithmeticError, whose primary subclasses are OverflowError, ZeroDivisionError, and FloatingPointError, and LookupError, whose main subclasses are IndexError and KeyError.
处理异常的语句与 Java 类似。try构造的一般形式如下:
The statements for dealing with exceptions are similar to those of Java. The general form of a try construct is as follows:
try:
The try block (the range of statements to be watched for exceptions)
except Exception1:
Handler for Exception1
except Exception2:
Handler for Exception2
...
else:
The else block (what to do when no exception is raised)
finally:
The finally block (what must be done regardless of what happened)
try:
The try block (the range of statements to be watched for exceptions)
except Exception1:
Handler for Exception1
except Exception2:
Handler for Exception2
...
else:
The else block (what to do when no exception is raised)
finally:
The finally block (what must be done regardless of what happened)
else和子句都是finally可选的。
Both the else and the finally clauses are optional.
Java 和 Python 中的处理程序之间的一个区别是 Python 使用except来引入它们,而不是catch。如果块else中未引发任何异常,则执行该子句。该子句与 Java 中的子句具有相同的语义:如果块中引发异常但未由紧随其后的处理程序处理,则在执行块后传播异常。由于处理程序处理其命名的异常以及该异常的所有子类,因此命名的处理程序会处理所有预定义和用户定义的异常。tryfinallytryfinallyException
One difference between handlers in Java and Python is that Python uses except to introduce them, rather than catch. The else clause is executed if no exception is raised in the try block. The finally clause has the same semantics as its counterpart in Java: If an exception is raised in the try block but is not handled by an immediately following handler, the exception is propagated after the finally block is executed. Because a handler handles its named exception, as well as all subclasses of that exception, a handler that names Exception handles all predefined and user-defined exceptions.
未处理的异常会逐渐传播到更大的封闭try结构,以寻找合适的处理程序。如果未找到任何处理程序,则将异常传播到函数的调用者,再次在嵌套try结构中搜索处理程序。如果在任何级别都找不到处理程序,则调用默认处理程序,这将生成错误消息和堆栈跟踪并终止程序。
An unhandled exception is propagated to progressively larger enclosing try constructs, searching for an appropriate handler. If none is found, the exception is propagated to the function’s caller, again searching for a handler in a nesting try construct. If no handler is found at any level, the default handler is called, which produces an error message and a stack trace and terminates the program.
Python 的语句raise与 Java 和 C++ 的语句类似throw。 的参数raise是要引发的异常的类名。例如,我们可以有以下内容:
The raise statement of Python is similar to the throw statement of Java and C++. The parameter for raise is the class name of the exception to be raised. For example, we could have the following:
raise IndexErrorraise IndexError
此语句隐式创建了命名类的实例IndexError。
This statement implicitly creates an instance of the named class, IndexError.
异常处理程序可以通过提供子句和变量名来访问引发的异常的对象as,如下所示:
An exception handler can gain access to the object of the raised exception by providing an as clause and a variable name, as in the following:
except Exception as ex_obj:except Exception as ex_obj:
print这是一个通用处理程序,因为它处理所有异常。可以使用处理程序中的语句打印异常对象,该语句生成对象的消息。例如,如果异常是ZeroDivisionError,则消息将是division by zero。
This is a universal handler, as it handles all exceptions. The exception object can be printed with a print statement in the handler, which produces the message of the object. For example, if the exception was ZeroDivisionError, the message would be division by zero.
Python 的assert语句提供了一种机制,使得某些异常处理成为可选的。的一般形式assert如下:
Python’s assert statement provides a mechanism for making some exception handling optional. The general form of assert is as follows:
assert test, dataassert test, data
在此语句中,test是布尔标志或表达式,data是发送给要引发异常对象的构造函数的值。此语句(可选地引发异常)的含义AssertionError可以用以下代码描述:
In this statement, test is a Boolean flag or expression and data is the value that is sent to the constructor for the exception object to be raised. The meaning of this statement, which optionally raises the AssertionError exception, can be described with the following code:
if __debug__:
if not test:
raise AssertionError(data)
if __debug__:
if not test:
raise AssertionError(data)
__debug__是一个预定义标志,除非在运行程序的命令上使用True该标志,否则设置为。这允许禁用程序特定运行的所有语句。如果程序未处理异常(与其他未处理的异常一样),它会在使用默认处理程序后终止程序。-0assertAssertionError
__debug__ is a predefined flag that is set to True unless the -0 flag is used on the command that runs the program. This allows one to disable all assert statements for a particular run of the program. If an AssertionError exception is not handled by the program, like other unhandled exceptions, it terminates the program after using the default handler.
Python 没有与throwsJava 子句等效的子句。
Python does not have an equivalent to the throws clause of Java.
和 Python 一样,Ruby 异常也是对象,它拥有大量预定义的异常类。应用程序处理的所有异常都是 类的对象StandardError或从其派生的类。StandardError派生自Exception,它为其所有后代提供了两种有用的方法。它们是message,它返回人类可读的错误消息,以及backtrace,它从引发异常的方法开始返回堆栈跟踪。 的一些预定义子类StandardError是ArgumentError、IndexError、IOError和ZeroDivisionError。
Like Python, Ruby exceptions are objects and it has a large collection of predefined exception classes. All of the exceptions that are handled by application programs are either objects of the StandardError class or a class that descends from it. StandardError is derived from Exception, which provides two useful methods to all its descendants. These are message, which returns the human-readable error message, and backtrace, which returns a stack trace starting from the method where the exception was raised. Some of the predefined subclasses of StandardError are ArgumentError, IndexError, IOError, and ZeroDivisionError.
该raise方法会显式引发异常。raise通常使用字符串参数进行调用。在这种情况下,它会引发一个RuntimeError以字符串作为消息的新对象。例如,我们可以有以下内容:
Exceptions are explicitly raised with the raise method. raise is often called with a string parameter. In this case, it raises a new RuntimeError object with the string as its message. For example, we could have the following:
raise "bad parameter" if count == 0raise "bad parameter" if count == 0
raise也可以有两个参数,第一个参数是异常类的对象。exception调用此对象的方法并Exception引发返回的对象。在这种情况下,第二个参数将是要显示的字符串消息。例如,我们可以有以下内容:
raise could also have two parameters, the first of which would be an object of an exception class. The exception method of this object is called and the returned Exception object is raised. In this case, the second parameter would be the string message to be displayed. For example, we could have the following:
raise TypeError, "Float parameter expected"
if not param.is_a? Float
raise TypeError, "Float parameter expected"
if not param.is_a? Float
异常处理程序通过子句指定rescue,子句附加在语句上。要将异常处理程序附加到一段代码,需要将代码放在begin-end块中。rescue子句放置在块中,位于块代码之后。一般情况下,如下所示:
An exception handler is specified with a rescue clause, which is attached to a statement. To attach an exception handler to a segment of code, the code is placed in a begin-end block. The rescue clause is placed in the block after the code of the block. In general, this appears as in the following:
begin块中的语句序列rescue处理程序end
beginThe sequence of statements in the blockrescueThe handlerend
块begin-end可以包含else子句和/或ensure子句。else子句与 Python 中的子句完全相同。ensure子句与子句完全相同finally。方法可以代替begin-end块充当异常处理的容器。
A begin-end block can include an else clause and/or an ensure clause. The else clause is exactly like that of Python. The ensure clause is exactly like a finally clause. A method can act as a container for exception handling in place of a begin-end block.
与大多数其他语言明显不同的是,Ruby 允许在处理异常后重新运行引发异常的一段代码。这是通过retry处理程序末尾的语句指定的。
In a clear departure from most other languages, Ruby allows a segment of code that raised an exception to be rerun after the exception is handled. This is specified with a retry statement at the end of the handler.
事件处理类似于异常处理。在这两种情况下,处理程序都是在发生某些事情(异常或事件)时隐式调用的。虽然异常可以由用户代码显式引发,也可以由硬件或软件解释器隐式引发,但事件是由外部操作(例如通过图形用户界面 (GUI) 进行的用户交互)创建的。本节将介绍事件处理的基础知识,这些基础知识比异常处理的基础知识简单。
Event handling is similar to exception handling. In both cases, the handlers are implicitly called by the occurrence of something, either an exception or an event. While exceptions can be raised either explicitly by user code or implicitly by hardware or a software interpreter, events are created by external actions, such as user interactions through a graphical user interface (GUI). In this section, the fundamentals of event handling, which are less complex than those of exception handling, are introduced.
在传统(非事件驱动)编程中,程序代码本身指定了代码的执行顺序,尽管该顺序通常受程序输入数据的影响。在事件驱动编程中,程序的某些部分会在完全不可预测的时间执行,通常由用户与执行程序的交互触发。
In conventional (non-event-driven) programming, the program code itself specifies the order in which that code is executed, although the order is usually affected by the program’s input data. In event-driven programming, parts of the program are executed at completely unpredictable times, often triggered by user interactions with the executing program.
本章讨论的特定类型的事件处理与 GUI 相关。因此,大多数事件都是由用户通过图形对象或组件(通常称为小部件)的交互引起的。最常见的小部件是按钮。使用 GUI 组件实现对用户交互的反应是最常见的事件处理形式。
The particular kind of event handling discussed in this chapter is related to GUIs. Therefore, most of the events are caused by user interactions through graphical objects or components, often called widgets. The most common widgets are buttons. Implementing reactions to user interactions with GUI components is the most common form of event handling.
事件是通知已发生某件特定事情,例如鼠标单击图形按钮。严格来说,事件是运行时系统响应用户操作而隐式创建的对象,至少在此处讨论事件处理的上下文中是如此。
An event is a notification that something specific has occurred, such as a mouse click on a graphical button. Strictly speaking, an event is an object that is implicitly created by the run-time system in response to a user action, at least in the context in which event handling is being discussed here.
事件处理程序是响应事件出现而执行的一段代码。事件处理程序使程序能够响应用户操作。
An event handler is a segment of code that is executed in response to the appearance of an event. Event handlers enable a program to be responsive to user actions.
尽管事件驱动编程早在 GUI 出现之前就已开始使用,但随着这些界面的流行,它才成为一种广泛使用的编程方法。例如,考虑呈现给 Web 浏览器用户的 GUI。呈现给浏览器用户的许多 Web 文档现在都是动态的。这样的文档可能会向用户呈现订单表,用户通过单击按钮来选择商品。与这些按钮单击相关的所需内部计算由对单击事件作出反应的事件处理程序执行。
Although event-driven programming was being used long before GUIs appeared, it has become a widely used programming methodology only in response to the popularity of these interfaces. As an example, consider the GUIs presented to users of Web browsers. Many Web documents presented to browser users are now dynamic. Such a document may present an order form to the user, who chooses the merchandise by clicking buttons. The required internal computations associated with these button clicks are performed by event handlers that react to the click events.
事件处理程序的另一个常见用途是检查表单元素中的简单错误和遗漏,无论是在表单被更改时还是在将表单提交到 Web 服务器进行处理时。使用浏览器上的事件处理来检查表单数据的有效性可以节省将数据发送到服务器的时间,在服务器中,必须先由驻留在服务器上的程序或脚本检查其正确性,然后才能处理它们。这种事件驱动的编程通常使用客户端脚本语言(如 JavaScript)来完成。
Another common use of event handlers is to check for simple errors and omissions in the elements of a form, either when they are changed or when the form is submitted to the Web server for processing. Using event handling on the browser to check the validity of form data saves the time of sending that data to the server, where their correctness then must be checked by a server-resident program or script before they can be processed. This kind of event-driven programming is often done using a client-side scripting language, such as JavaScript.
除了 Web 应用程序之外,非 Web Java 应用程序也可以向用户呈现 GUI。本节将讨论 Java 应用程序中的 GUI。
In addition to Web applications, non-Web Java applications can present GUIs to users. GUIs in Java applications are discussed in this section.
Java 的初始版本对 GUI 组件的支持形式比较原始。在 1998 年底发布的 Java 1.2 版本中,添加了一组新组件。这些组件统称为 Swing。
The initial version of Java provided a somewhat primitive form of support for GUI components. In version 1.2 of the language, released in late 1998, a new collection of components was added. These were collectively called Swing.
在 中定义的 Swing 类和接口集合javax.swing包括 GUI 组件或小部件。由于我们这里感兴趣的是事件处理,而不是 GUI 组件,因此我们仅讨论两种小部件:文本框和单选按钮。
The Swing collection of classes and interfaces, defined in javax.swing, includes GUI components, or widgets. Because our interest here is event handling, not GUI components, we discuss only two kinds of widgets: text boxes and radio buttons.
文本框是 类的对象JTextField。最简单的JTextField构造函数只接受一个参数,即文本框的长度(以字符为单位)。例如,
A text box is an object of class JTextField. The simplest JTextField constructor takes a single parameter, the length of the box in characters. For example,
JTextField name = new JTextField(32);JTextField name = new JTextField(32);
构造JTextField函数还可以将文字字符串作为可选的第一个参数。此字符串参数(如果存在)将显示为文本框的初始内容。
The JTextField constructor can also take a literal string as an optional first parameter. This string parameter, when present, is displayed as the initial contents of the text box.
单选按钮是放置在按钮组容器中的特殊按钮。按钮组是 类的对象ButtonGroup,其构造函数不采用任何参数。在单选按钮组中,一次只能按下一个按钮。如果组中的任何按钮被按下,则先前按下的按钮将隐式变为未按下状态。JRadioButton用于创建单选按钮的构造函数采用两个参数:标签和单选按钮的初始状态(true或false,分别表示按下和未按下)。如果组中的一个单选按钮最初设置为按下,则组中的其他单选按钮默认为未按下。创建单选按钮后,使用组对象的方法将它们放置在其按钮组中add。请考虑以下示例:
Radio buttons are special buttons that are placed in a button group container. A button group is an object of class ButtonGroup, whose constructor takes no parameters. In a radio button group, only one button can be pressed at a time. If any button in the group becomes pressed, the previously pressed button is implicitly unpressed. The JRadioButton constructor, used for creating radio buttons, takes two parameters: a label and the initial state of the radio button (true or false, for pressed and not pressed, respectively). If one radio button in a group is initially set to pressed, the others in the group default to unpressed. After the radio buttons are created, they are placed in their button group with the add method of the group object. Consider the following example:
ButtonGroup payment = new ButtonGroup();
JRadioButton box1 = new JRadioButton("Visa", true);
JRadioButton box2 = new JRadioButton("Master Charge");
JRadioButton box3 = new JRadioButton("Discover");
payment.add(box1);
payment.add(box2);
payment.add(box3);
ButtonGroup payment = new ButtonGroup();
JRadioButton box1 = new JRadioButton("Visa", true);
JRadioButton box2 = new JRadioButton("Master Charge");
JRadioButton box3 = new JRadioButton("Discover");
payment.add(box1);
payment.add(box2);
payment.add(box3);
对象JFrame是框架,显示为单独的窗口。JFrame类定义框架所需的数据和方法。因此,使用框架的类可以是的子类JFrame。AJFrame有多个称为窗格的层。我们只对其中一个层感兴趣,即内容窗格。GUI 的组件放置在JPanel对象(面板)中,该对象用于组织和定义组件的布局。创建一个框架,并将包含组件的面板添加到该框架的内容窗格中。
A JFrame object is a frame, which is displayed as a separate window. The JFrame class defines the data and methods that are needed for frames. So, a class that uses a frame can be a subclass of JFrame. A JFrame has several layers called panes. We are interested in just one of those layers, the content pane. Components of a GUI are placed in a JPanel object (a panel), which is used to organize and define the layout of the components. A frame is created and the panel containing the components is added to that frame’s content pane.
预定义的图形对象(例如 GUI 组件)直接放置在面板中。下面创建了以下组件讨论中使用的面板对象:
Predefined graphic objects, such as GUI components, are placed directly in a panel. The following creates the panel object used in the following discussion of components:
JPanel myPanel = new JPanel();JPanel myPanel = new JPanel();
使用构造函数创建组件后,可以使用方法将它们放置在面板中add,如下所示
After the components have been created with constructors, they are placed in the panel with the add method, as in
myPanel.add(button1);myPanel.add(button1);当用户与 GUI 组件交互时(例如单击按钮),组件会创建一个事件对象并通过称为事件侦听器的对象调用事件处理程序,并传递该事件对象。事件处理程序提供相关操作。GUI 组件是事件生成器。在 Java 中,事件通过事件侦听器连接到事件处理程序。事件侦听器通过事件 侦听器注册连接到事件生成器。侦听器注册是通过实现侦听器接口的类的方法完成的,如本节后面所述。只有为特定事件注册的事件侦听器才会在该事件发生时收到通知。
When a user interacts with a GUI component, for example by clicking a button, the component creates an event object and calls an event handler through an object called an event listener, passing the event object. The event handler provides the associated actions. GUI components are event generators. In Java, events are connected to event handlers through event listeners. Event listeners are connected to event generators through event listener registration. Listener registration is done with a method of the class that implements the listener interface, as described later in this section. Only event listeners that are registered for a specific event are notified when that event occurs.
接收消息的侦听器方法实现事件处理程序。为了使事件处理方法符合标准协议,需要使用接口。接口规定了标准方法协议,但不提供这些方法的实现。
The listener method that receives the message implements an event handler. To make the event-handling methods conform to a standard protocol, an interface is used. An interface prescribes standard method protocols but does not provide implementations of those methods.
需要实现事件处理程序的类必须为该处理程序的侦听器实现一个接口。事件和侦听器接口有多种类型。其中一种事件是ItemEvent,它与单击复选框或单选按钮或选择列表项的事件相关联。该ItemListener接口包括方法 的协议,itemStateChanged它是事件的处理程序ItemEvent。因此,要提供由单选按钮单击触发的操作,ItemListener必须实现接口,这需要定义方法itemStateChanged。
A class that needs to implement an event handler must implement an interface for the listener for that handler. There are several classes of events and listener interfaces. One class of events is ItemEvent, which is associated with the event of clicking a checkbox or a radio button, or selecting a list item. The ItemListener interface includes the protocol of a method, itemStateChanged, which is the handler for ItemEvent events. So, to provide an action that is triggered by a radio button click, the interface ItemListener must be implemented, which requires a definition of the method, itemStateChanged.
如前所述,组件与事件侦听器的连接是通过实现侦听器接口的类的方法实现的。例如,由于 是ItemEvent用户对单选按钮的操作所创建的事件对象的类名,因此该addItemListener方法用于注册单选按钮的侦听器。面板中创建的按钮事件的侦听器可以在面板或 的子类中实现。因此,对于实现按钮事件处理程序的面板中JPanel名为 的单选按钮,我们将使用以下语句注册侦听器:button1myPanelItemEvent
As stated previously, the connection of a component to an event listener is made with a method of the class that implements the listener interface. For example, because ItemEvent is the class name of event objects created by user actions on radio buttons, the addItemListener method is used to register a listener for radio buttons. The listener for button events created in a panel could be implemented in the panel or a subclass of JPanel. So, for a radio button named button1 in a panel named myPanel that implements the ItemEvent event handler for buttons, we would register the listener with the following statement:
button1.addItemListener(this);button1.addItemListener(this);
每个事件处理程序方法都会收到一个事件参数,该参数提供有关事件的信息。事件类具有访问该信息的方法。例如,当通过单选按钮调用时,该isSelected方法将返回true或false,具体取决于按钮是打开还是关闭(按下或未按下)。
Each event handler method receives an event parameter, which provides information about the event. Event classes have methods to access that information. For example, when called through a radio button, the isSelected method returns true or false, depending on whether the button was on or off (pressed or not pressed), respectively.
所有与事件相关的类都在java.awt.event包中,因此它会被导入到任何使用事件的类中。
All the event-related classes are in the java.awt.event package, so it is imported to any class that uses events.
以下是一个示例应用程序,RadioB它说明了事件和事件处理的使用。此应用程序构造单选按钮来控制文本字段内容的字体样式。它Font为四种字体样式中的每一种创建一个对象。每种字体样式都有一个单选按钮,使用户可以选择字体样式。
The following is an example application, RadioB, that illustrates the use of events and event handling. This application constructs radio buttons that control the font style of the contents of a text field. It creates a Font object for each of four font styles. Each of these has a radio button to enable the user to select the font style.
此示例的目的是展示如何处理 GUI 组件引发的事件以动态更改程序的输出显示。由于我们只关注事件处理,因此这里不解释此程序的某些部分。
The purpose of this example is to show how events raised by GUI components can be handled to change the output display of the program dynamically. Because of our narrow focus on event handling, some parts of this program are not explained here.
/* RadioB.java
An example to illustrate event handling with interactive
radio buttons that control the font style of a textfield
*/
package radiob;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
public class RadioB extends JPanel implements
ItemListener {
private JTextField text;
private Font plainFont, boldFont, italicFont,
boldItalicFont;
private JRadioButton plain, bold, italic, boldItalic;
private ButtonGroup radioButtons;
// The constructor method is where the display is initially
// built
public RadioB() {
// Create the test text string and set its font
text = new JTextField(
"In what font style should I appear?", 25);
text.setFont(plainFont);
// Create radio buttons for the fonts and add them to
// a new button group
plain = new JRadioButton("Plain", true);
bold = new JRadioButton("Bold");
italic = new JRadioButton("Italic");
boldItalic = new JRadioButton("Bold Italic");
radioButtons = new ButtonGroup();
radioButtons.add(plain);
radioButtons.add(bold);
radioButtons.add(italic);
radioButtons.add(boldItalic);
// Create a panel and put the text and the radio
// buttons in it; then add the panel to the frame
JPanel radioPanel = new JPanel();
radioPanel.add(text);
radioPanel.add(plain);
radioPanel.add(bold);
radioPanel.add(italic);
radioPanel.add(boldItalic);
add(radioPanel, BorderLayout.LINE_START);
// Register the event handlers
plain.addItemListener(this);
bold.addItemListener(this);
italic.addItemListener(this);
boldItalic.addItemListener(this);
// Create the fonts
plainFont = new Font("Serif", Font.PLAIN, 16);
boldFont = new Font("Serif", Font.BOLD, 16);
italicFont = new Font("Serif", Font.ITALIC, 16);
boldItalicFont = new Font("Serif", Font.BOLD +
Font.ITALIC, 16);
} // End of the constructor for RadioB
// The event handler
public void itemStateChanged (ItemEvent e) {
// Determine which button is on and set the font
// accordingly
if (plain.isSelected())
text.setFont(plainFont);
else if (bold.isSelected())
text.setFont(boldFont);
else if (italic.isSelected())
text.setFont(italicFont);
else if (boldItalic.isSelected())
text.setFont(boldItalicFont);
} // End of itemStateChanged
// The main method
public static void main(String[] args) {
// Create the window frame
JFrame myFrame = new JFrame(" Radio button
example");
// Create the content pane and set it to the frame
JComponent myContentPane = new RadioB();
myContentPane.setOpaque(true);
myFrame.setContentPane(myContentPane);
// Display the window.
myFrame.pack();
myFrame.setVisible(true);
}
} // End of RadioB
/* RadioB.java
An example to illustrate event handling with interactive
radio buttons that control the font style of a textfield
*/
package radiob;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
public class RadioB extends JPanel implements
ItemListener {
private JTextField text;
private Font plainFont, boldFont, italicFont,
boldItalicFont;
private JRadioButton plain, bold, italic, boldItalic;
private ButtonGroup radioButtons;
// The constructor method is where the display is initially
// built
public RadioB() {
// Create the test text string and set its font
text = new JTextField(
"In what font style should I appear?", 25);
text.setFont(plainFont);
// Create radio buttons for the fonts and add them to
// a new button group
plain = new JRadioButton("Plain", true);
bold = new JRadioButton("Bold");
italic = new JRadioButton("Italic");
boldItalic = new JRadioButton("Bold Italic");
radioButtons = new ButtonGroup();
radioButtons.add(plain);
radioButtons.add(bold);
radioButtons.add(italic);
radioButtons.add(boldItalic);
// Create a panel and put the text and the radio
// buttons in it; then add the panel to the frame
JPanel radioPanel = new JPanel();
radioPanel.add(text);
radioPanel.add(plain);
radioPanel.add(bold);
radioPanel.add(italic);
radioPanel.add(boldItalic);
add(radioPanel, BorderLayout.LINE_START);
// Register the event handlers
plain.addItemListener(this);
bold.addItemListener(this);
italic.addItemListener(this);
boldItalic.addItemListener(this);
// Create the fonts
plainFont = new Font("Serif", Font.PLAIN, 16);
boldFont = new Font("Serif", Font.BOLD, 16);
italicFont = new Font("Serif", Font.ITALIC, 16);
boldItalicFont = new Font("Serif", Font.BOLD +
Font.ITALIC, 16);
} // End of the constructor for RadioB
// The event handler
public void itemStateChanged (ItemEvent e) {
// Determine which button is on and set the font
// accordingly
if (plain.isSelected())
text.setFont(plainFont);
else if (bold.isSelected())
text.setFont(boldFont);
else if (italic.isSelected())
text.setFont(italicFont);
else if (boldItalic.isSelected())
text.setFont(boldItalicFont);
} // End of itemStateChanged
// The main method
public static void main(String[] args) {
// Create the window frame
JFrame myFrame = new JFrame(" Radio button
example");
// Create the content pane and set it to the frame
JComponent myContentPane = new RadioB();
myContentPane.setOpaque(true);
myFrame.setContentPane(myContentPane);
// Display the window.
myFrame.pack();
myFrame.setVisible(true);
}
} // End of RadioB
该应用程序生成如图14.2RadioB.java所示的屏幕。
The RadioB.java application produces the screen shown in Figure 14.2.
RadioB.javaRadioB.java来源: Java 广播小程序截图。
Source: Java radio applet screenshot.
C#(以及其他 .NET 语言)中的事件处理与 Java 类似。.NET 提供了两种在应用程序中创建 GUI 的方法,即原始的 Windows 窗体和较新的 Windows Presentation Foundation。后者是两者中更复杂和精密的。因为我们的兴趣只在于事件处理,所以我们将使用更简单的 Windows 窗体来讨论我们的主题。
Event handling in C# (and in the other .NET languages) is similar to that of Java. .NET provides two approaches to creating GUIs in applications, the original Windows Forms and the more recent Windows Presentation Foundation. The latter is the more sophisticated and complex of the two. Because our interest is just in event handling, we will use the simpler Windows Forms to discuss our subject.
Form使用 Windows 窗体,通过对在命名空间中定义的预定义类进行子类化,可以创建构造 GUI 的 C# 应用程序System.Windows.Forms。此类隐式提供了一个窗口来包含我们的组件。无需显式构建框架或面板。
Using Windows Forms, a C# application that constructs a GUI is created by subclassing the Form predefined class, which is defined in the System.Windows.Forms namespace. This class implicitly provides a window to contain our components. There is no need to build frames or panels explicitly.
文本可以放在Label对象中,单选按钮是该类的对象RadioButton。对象的大小Label未在构造函数中明确指定;而是可以通过将对象AutoSize的数据成员设置为来指定,这会根据放置在其中的内容设置大小。Labeltrue
Text can be placed in a Label object and radio buttons are objects of the RadioButton class. The size of a Label object is not explicitly specified in the constructor; rather it can be specified by setting the AutoSize data member of the Label object to true, which sets the size according to what is placed in it.
Point通过将新对象分配给Location组件的属性,可以将组件放置在窗口中的特定位置。该类在命名空间Point中定义System.Drawing。Point构造函数采用两个参数,即对象的坐标(以像素为单位)。例如,是距离窗口左边缘 1000 像素和距离顶部 1000 像素的Point(100, 200)位置。通过将字符串文字分配给组件的属性来设置组件的标签。创建组件后,通过将其发送到表单子类的方法将其添加到表单窗口。因此,以下代码在输出窗口中的位置创建一个带有标签的单选按钮:100200TextAddControlsPlain(100, 300)
Components can be placed at a particular location in the window by assigning a new Point object to the Location property of the component. The Point class is defined in the System.Drawing namespace. The Point constructor takes two parameters, which are the coordinates of the object in pixels. For example, Point(100, 200) is a position that is 100 pixels from the left edge of the window and 200 pixels from the top. The label of a component is set by assigning a string literal to the Text property of the component. After creating a component, it is added to the form window by sending it to the Add method of the Controls subclass of the form. Therefore, the following code creates a radio button with the label Plain at the (100, 300) position in the output window:
private RadioButton plain = new RadioButton();
plain.Location = new Point(100, 300);
plain.Text = "Plain";
Controls.Add(plain);
private RadioButton plain = new RadioButton();
plain.Location = new Point(100, 300);
plain.Text = "Plain";
Controls.Add(plain);
所有 C# 事件处理程序都具有相同的协议:返回类型为void,两个参数的类型为object和EventArgs。简单情况下不需要使用这两个参数。事件处理程序方法可以具有任何名称。使用按钮的 Boolean 属性测试单选按钮以确定是否单击了该按钮Checked。请考虑以下事件处理程序的骨架示例:
All C# event handlers have the same protocol: the return type is void and the two parameters are of types object and EventArgs. Neither of the parameters needs to be used for a simple situation. An event handler method can have any name. A radio button is tested to determine whether it is clicked with the Boolean Checked property of the button. Consider the following skeletal example of an event handler:
private void rb_CheckedChanged (object o, EventArgs e){
if (plain.Checked) . . .
. . .
}
private void rb_CheckedChanged (object o, EventArgs e){
if (plain.Checked) . . .
. . .
}
要注册事件,EventHandler必须创建一个新对象。向此类的构造函数发送处理程序方法的名称。新对象被添加到组件对象上事件的预定义委托中(使用赋值运算符+=)。例如,当单选按钮从未选中变为选中时,CheckedChanged将引发事件并调用在关联委托上注册的处理程序(由事件名称引用)。如果事件处理程序名为rb_CheckedChanged,则以下语句将注册CheckedChanged单选按钮上事件的处理程序plain:
To register an event, a new EventHandler object must be created. The constructor for this class is sent the name of the handler method. The new object is added to the predefined delegate for the event on the component object (using the += assignment operator). For example, when a radio button changes from unchecked to checked, the CheckedChanged event is raised and the handlers registered on the associated delegate, which is referenced by the name of the event, are called. If the event handler is named rb_CheckedChanged, the following statement would register the handler for the CheckedChanged event on the radio button plain:
plain. CheckedChanged +=
new EventHandler(rb_CheckedChanged);
plain. CheckedChanged +=
new EventHandler(rb_CheckedChanged);
以下是用 C# 重写的第14.6节RadioB中的示例。再次说明,因为我们的重点是事件处理,所以我们不会解释程序的所有细节。
Following is the RadioB example from Section 14.6 rewritten in C#. Once again, because our focus is on event handling, we do not explain all of the details of the program.
// RadioB.cs
// An example to illustrate event handling with
// interactive radio buttons that control the font
// style of a string of text
namespace RadioB {
using System;
using System.Drawing;
using System.Windows.Forms;
public class RadioB : Form {
private Label text = new Label();
private RadioButton plain = new RadioButton();
private RadioButton bold = new RadioButton();
private RadioButton italic = new RadioButton();
private RadioButton boldItalic = new RadioButton();
// Constructor for RadioB
public RadioB() {
// Initialize the attributes of the text and radio
// buttons
text.AutoSize = true;
text.Text = "In what font style should I appear?";
plain.Location = new Point(220,0);
plain.Text = "Plain";
plain.Checked = true;
bold.Location = new Point(350, 0);
bold.Text = "Bold";
italic.Location = new Point(480, 0);
italic.Text = "Italics";
boldItalic.Location = new Point(610, 0);
boldItalic.Text = "Bold/Italics";
// Add the text and the radio buttons to the form
Controls.Add(text);
Controls.Add(plain);
Controls.Add(bold);
Controls.Add(italic);
Controls.Add(boldItalic);
// Register the event handler for the radio buttons
plain.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
bold.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
italic.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
boldItalic.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
}
// The main method is where execution begins
static void Main() {
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault (false);
Application.Run(new RadioB());
}
// The event handler
private void rb_CheckedChanged (object o,
EventArgs e) {
// Determine which button is on and set the font
// accordingly
if (plain.Checked)
text.Font =
new Font( text.Font.Name, text.Font.Size,
FontStyle.Regular);
if (bold.Checked)
text.Font =
new Font(text.Font.Name, text.Font.Size,
FontStyle.Bold);
if (italic.Checked)
text.Font =
new Font(text.Font.Name, text.Font.Size,
FontStyle.Italic);
if (boldItalic.Checked)
text.Font =
new Font(text.Font.Name, text.Font.Size,
FontStyle.Italic ^ FontStyle.Bold);
} // End of radioButton_CheckedChanged
} // End of RadioB
}
// RadioB.cs
// An example to illustrate event handling with
// interactive radio buttons that control the font
// style of a string of text
namespace RadioB {
using System;
using System.Drawing;
using System.Windows.Forms;
public class RadioB : Form {
private Label text = new Label();
private RadioButton plain = new RadioButton();
private RadioButton bold = new RadioButton();
private RadioButton italic = new RadioButton();
private RadioButton boldItalic = new RadioButton();
// Constructor for RadioB
public RadioB() {
// Initialize the attributes of the text and radio
// buttons
text.AutoSize = true;
text.Text = "In what font style should I appear?";
plain.Location = new Point(220,0);
plain.Text = "Plain";
plain.Checked = true;
bold.Location = new Point(350, 0);
bold.Text = "Bold";
italic.Location = new Point(480, 0);
italic.Text = "Italics";
boldItalic.Location = new Point(610, 0);
boldItalic.Text = "Bold/Italics";
// Add the text and the radio buttons to the form
Controls.Add(text);
Controls.Add(plain);
Controls.Add(bold);
Controls.Add(italic);
Controls.Add(boldItalic);
// Register the event handler for the radio buttons
plain.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
bold.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
italic.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
boldItalic.CheckedChanged +=
new EventHandler(rb_CheckedChanged);
}
// The main method is where execution begins
static void Main() {
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault (false);
Application.Run(new RadioB());
}
// The event handler
private void rb_CheckedChanged (object o,
EventArgs e) {
// Determine which button is on and set the font
// accordingly
if (plain.Checked)
text.Font =
new Font( text.Font.Name, text.Font.Size,
FontStyle.Regular);
if (bold.Checked)
text.Font =
new Font(text.Font.Name, text.Font.Size,
FontStyle.Bold);
if (italic.Checked)
text.Font =
new Font(text.Font.Name, text.Font.Size,
FontStyle.Italic);
if (boldItalic.Checked)
text.Font =
new Font(text.Font.Name, text.Font.Size,
FontStyle.Italic ^ FontStyle.Bold);
} // End of radioButton_CheckedChanged
} // End of RadioB
}
该程序的输出与图 14.2所示的完全一样。
The output from this program is exactly like that shown in Figure 14.2.
C++ 不包含预定义异常(标准库中定义的异常除外)。C++ 异常是原始类型、预定义类或用户定义类的对象。通过将语句中表达式的类型throw与处理程序的形式参数的类型相连接,异常将绑定到处理程序。所有处理程序都具有相同的名称 — catch。方法的 C++throw子句列出了该方法可能抛出的异常类型。
C++ includes no predefined exceptions (except those defined in the standard library). C++ exceptions are objects of a primitive type, a predefined class, or a user-defined class. Exceptions are bound to handlers by connecting the type of the expression in the throw statement to that of the formal parameter of the handler. Handlers all have the same name—catch. The C++ throw clause of a method lists the types of exceptions that the method could throw.
Java 异常是其祖先必须追溯到从该类Throwable派生的类的对象。异常分为两类:已检查异常和未检查异常。已检查异常是用户程序和编译器关注的问题。未检查异常可能发生在任何地方,并且经常被用户程序忽略。
Java exceptions are objects whose ancestry must trace back to a class that descends from the Throwable class. There are two categories of exceptions—checked and unchecked. Checked exceptions are a concern for the user program and the compiler. Unchecked exceptions can occur anywhere and are often ignored by user programs.
方法的Javathrows子句列出了它可能抛出但无法处理的已检查异常。它必须包括它所调用的方法可能引发并传播回其调用者的异常。
The Java throws clause of a method lists the checked exceptions that it could throw and does not handle. It must include exceptions that methods it calls could raise and propagate back to its caller.
Javafinally子句提供了一种机制,用于保证无论try复合语句的执行如何终止,某些代码都将被执行。
The Java finally clause provides a mechanism for guaranteeing that some code will be executed regardless of how the execution of a try compound terminates.
Java 现在包含一个assert语句,它有助于防御性编程。
Java now includes an assert statement, which facilitates defensive programming.
Python 的异常处理与 Java 类似,尽管它在构造else中添加了子句try。此外,它使用except子句而不是catch子句来定义处理程序,而raise不是throw。通过使用子句将对象分配给变量,可以访问异常对象的数据as。Python 的assert语句是条件语句raise。
Python’s exception handling is similar to that of Java, although it adds the else clause to the try construct. Also, it uses except clauses rather than catch clauses to define handlers and raise instead of throw. Access to the data of an exception object is gained by assigning the object to a variable with an as clause. Python’s assert statement is a conditional raise.
Ruby 中的异常处理与 Python 中的异常处理类似。每个异常类都有两个方法,message和。异常通常由带有单个字符串参数的语句backtrace引发。这将创建一个以字符串作为其消息的新对象。可以通过向语句添加条件表达式来使语句具有条件性。异常处理程序的范围通常用块指定。处理程序在子句中定义。一个块可以包含一个子句和一个子句,这类似于Python 和 Java 的子句。raiseRuntimeErrorraisebegin-endrescuebegin-endelseensurefinally
Exception handling in Ruby is similar to that of Python. Every exception class has two methods, message and backtrace. Exceptions are often raised with a raise statement with a single string parameter. This creates a new RuntimeError object with the string as its message. The raise statement can be made conditional by adding a conditional expression to it. The scope of exception handlers usually is specified with a begin-end block. Handlers are defined in rescue clauses. A begin-end block can include an else clause and an ensure clause, which is like the finally clause of Python and Java.
事件是一种通知,表示发生了需要特殊处理的事情。事件通常由用户通过图形用户界面与程序交互而创建。Java 事件处理程序通过事件侦听器调用。如果要在事件发生时通知事件侦听器,则必须为事件注册事件侦听器。最常用的两个事件侦听器接口是actionPerformed和itemStateChanged。
An event is a notification that something has occurred that requires special processing. Events are often created by user interactions with a program through a graphical user interface. Java event handlers are called through event listeners. An event listener must be registered for an event if it is to be notified when the event occurs. Two of the most commonly used event listeners interfaces are actionPerformed and itemStateChanged.
Windows 窗体是使用 .NET 语言构建 GUI 组件和处理事件的原始方法。C# 应用程序通过对类进行子类化,以此方法构建 GUI 。所有 .NET 事件处理程序都使用相同的协议。通过创建对象并将其分配给与可以引发事件的 GUI 对象关联的预定义委托来Form注册事件处理程序。EventHandler
Windows Forms is the original approach to building GUI components and handling events in .NET languages. A C# application builds a GUI in this approach by subclassing the Form class. All .NET event handlers use the same protocol. Event handlers are registered by creating an EventHandler object and assigning it to the predefined delegate associated with the GUI object that can raise the event.
Goodenough (1975)的著作是有关异常处理的最重要论文之一,它与特定的编程语言无关。MacLaren (1977)介绍了 PL/I 异常处理设计中存在的问题。Stroustrup (1997)描述了 C++ 中的异常处理。Campione et al. (2001)描述了 Java 中的异常处理。
One of the most important papers on exception handling that is not connected with a particular programming language is the work by Goodenough (1975). The problems with the PL/I design for exception handling are covered in MacLaren (1977). Exception handling in C++ is described by Stroustrup (1997). Exception handling in Java is described by Campione et al. (2001).
定义异常、异常处理程序、引发异常、延续、最终确定和内置异常。
Define exception, exception handler, raising an exception, continuation, finalization, and built-in exception.
设计延续性的两种替代方案是什么?
What are the two alternatives for designing continuation?
在语言中内置对异常处理的支持有哪些优势?
What are the advantages of having support for exception handling built in to a language?
异常处理的设计问题是什么?
What are the design issues for exception handling?
将异常绑定到异常处理程序是什么意思?
What does it mean for an exception to be bound to an exception handler?
所有 C++ 异常处理程序的名称是什么?
What is the name of all C++ exception handlers?
如何在 C++ 中明确引发异常?
How can exceptions be explicitly raised in C++?
在 C++ 中,异常如何绑定到处理程序?
How are exceptions bound to handlers in C++?
如何用 C++ 编写异常处理程序以便处理任何异常?
How can an exception handler be written in C++ so that it handles any exception?
当 C++ 异常处理程序完成其执行后,执行控制权去往何处?
Where does execution control go when a C++ exception handler has completed its execution?
C++ 是否包含内置异常?
Does C++ include built-in exceptions?
为什么 C++ 中引发异常不叫 raise?
Why is the raising of an exception in C++ not called raise?
所有 Java 异常类的根类是什么?
What is the root class of all Java exception classes?
大多数 Java 用户定义异常类的父类是什么?
What is the parent class of most Java user-defined exception classes?
如何用 Java 编写异常处理程序以便处理任何异常?
How can an exception handler be written in Java so that it handles any exception?
throwC++规范和 Java条款之间有何区别throws?
What are the differences between a C++ throw specification and a Java throws clause?
Java 中已检查异常和未检查异常有什么区别?
What is the difference between checked and unchecked exceptions in Java?
如何用 Java 编写异常处理程序以便处理任何异常?
How can an exception handler be written in Java so that it handles any exception?
Java 子句的目的是什么finally?
What is the purpose of the Java finally clause?
if-then 语言定义的断言与简单构造相比有什么优势?
What advantage do language-defined assertions have over simple if-then constructs?
解释一下elsePython 中块的作用。
Explain what an else block in Python does.
asPython 中子句的用途是什么?
What is the purpose of an as clause in Python?
解释assertPython 中语句的作用。
Explain what an assert statement in Python does.
messageRuby 类的方法有什么作用StandardError?
What does the message method of Ruby’s StandardError class do?
raise当执行带有字符串参数的语句时究竟会发生什么?
What exactly happens when a raise statement with a string parameter is executed?
Rubyensure子句到底起什么作用?
What exactly does a Ruby ensure clause do?
异常处理和事件处理有何关联?
In what ways are exception handling and event handling related?
定义事件和事件处理程序。
Define event and event handler.
什么是事件驱动编程?
What is event-driven programming?
Java 的用途是什么JFrame?
What is the purpose of a Java JFrame?
Java 的用途是什么JPanel?
What is the purpose of a Java JPanel?
Java GUI 应用程序中哪个对象经常用作事件监听器?
What object is often used as the event listener in Java GUI applications?
Java 中事件处理程序协议的起源是什么?
What is the origin of the protocol for an event handler in Java?
在 Java 中使用什么方法注册事件处理程序?
What method is used to register an event handler in Java?
使用 .NET 的 Windows 窗体,为 C# 应用程序构建 GUI 需要什么命名空间?
Using .NET’s Windows Forms, what namespace is required to build a GUI for a C# application?
如何使用 Windows 窗体将组件定位在窗体中?
How is a component positioned in a form using Windows Forms?
.NET 事件处理程序的协议是什么?
What is the protocol of a .NET event handler?
必须创建哪一类对象才能注册 .NET 事件处理程序?
What class of object must be created to register a .NET event handler?
委托在注册事件处理程序的过程中扮演什么角色?
What role do delegates play in the process of registering event handlers?
C 的设计者由于不需要检查下标范围而得到了什么回报?
What did the designers of C get in return for not requiring subscript range checking?
描述不提供直接支持的语言中三种异常处理方法。
Describe three approaches to exception handling in languages that do not provide direct support for it.
从 PL/I 和 Ada 编程语言的教科书中查找各自的内置异常集。对两者进行比较评估,同时考虑其完整性和灵活性。
From textbooks on the PL/I and Ada programming languages, look up the respective sets of built-in exceptions. Do a comparative evaluation of the two, considering both completeness and flexibility.
从 COBOL 教科书中确定 COBOL 程序中如何进行异常处理。
From a textbook on COBOL, determine how exception handling is done in COBOL programs.
在没有异常处理功能的语言中,大多数子程序通常包含一个“错误”参数,该参数可以设置为表示“OK”的某个值或表示“程序错误”的其他某个值。像 Java 这样的语言异常处理功能与这种方法相比有什么优势?
In languages without exception-handling facilities, it is common to have most subprograms include an “error” parameter, which can be set to some value representing “OK” or some other value representing “error in procedure.” What advantage does a linguistic exception-handling facility like that of Java have over this method?
在没有异常处理功能的语言中,我们可以将错误处理过程作为参数发送给每个可以检测必须处理的错误的过程。这种方法有什么缺点?
In a language without exception-handling facilities, we could send an error-handling procedure as a parameter to each procedure that can detect errors that must be handled. What disadvantages are there to this method?
比较问题 5 和问题 6 中建议的方法。您认为哪种方法更好?为什么?
Compare the methods suggested in Problems 5 and 6. Which do you think is better and why?
throw写一篇关于C++条款和Java条款的比较分析throws。
Write a comparative analysis of the throw clause of C++ and the throws clause of Java.
考虑以下 C++ 骨架程序:
Consider the following C++ skeletal program:
class Big {
int i;
float f;
void fun1() throw int {
. . .
try {
. . .
throw i;
. . .
throw f;
. . .
}
catch(float) { . . . }
. . .
}
}
class Small {
int j;
float g;
void fun2() throw float {
. . .
try {
. . .
try {
Big.fun1();
. . .
throw j;
. . .
throw g;
. . .
}
catch(int) { . . . }
. . .
}
catch(float) { . . . }
}
}class Big {
int i;
float f;
void fun1() throw int {
. . .
try {
. . .
throw i;
. . .
throw f;
. . .
}
catch(float) { . . . }
. . .
}
}
class Small {
int j;
float g;
void fun2() throw float {
. . .
try {
. . .
try {
Big.fun1();
. . .
throw j;
. . .
throw g;
. . .
}
catch(int) { . . . }
. . .
}
catch(float) { . . . }
}
}
在这四个throw语句中,异常在哪里处理?请注意,在类中fun1调用了。fun2Small
In each of the four throw statements, where is the exception handled? Note that fun1 is called from fun2 in class Small.
详细比较 C++ 和 Java 的异常处理能力。
Write a detailed comparison of the exception-handling capabilities of C++ and those of Java.
借助 ML 方面的书籍,详细比较 ML 和 Java 的异常处理能力。
With the help of a book on ML, write a detailed comparison of the exception-handling capabilities of ML and those of Java.
总结支持终止和恢复延续模式的论点。
Summarize the arguments in favor of the termination and resumption models of continuation.
假设您正在编写一个 C++ 函数,该函数有三种替代方法来实现其要求。编写此函数的骨架版本,以便如果第一个替代方案引发任何异常,则尝试第二个替代方案,如果第二个替代方案引发任何异常,则执行第三个替代方案。编写代码,就好像这三种方法是名为alt1、alt2和 的过程一样alt3。
Suppose you are writing a C++ function that has three alternative approaches for accomplishing its requirements. Write a skeletal version of this function so that if the first alternative raises any exception, the second is tried, and if the second alternative raises any exception, the third is executed. Write the code as if the three methods were procedures named alt1, alt2, and alt3.
编写一个 Java 程序,输入以下范围内的整数值列表
从键盘输入 100,并计算输入值的平方和。该程序必须使用异常处理来确保输入值在范围内且是合法的整数,处理平方和大于标准变量Integer可以存储的值的错误,并检测文件末尾并使用它来输出结果。如果总和溢出,则必须打印错误消息并终止程序。
Write a Java program that inputs a list of integer values in the range of
to 100 from the keyboard and computes the sum of the squares of the input values. This program must use exception handling to ensure that the input values are in range and are legal integers, to handle the error of the sum of the squares becoming larger than a standard Integer variable can store, and to detect end-of-file and use it to cause the output of the result. In the case of overflow of the sum, an error message must be printed and the program terminated.
按照编程练习2的规范编写一个C++程序。
Write a C++ program for the specification of Programming Exercise 2.
Revise the Java program of Section 14.3.5 to use EOFException to detect the end of the input.
Rewrite the Java code of Section 14.3.6 that uses a finally clause in C++.
本章介绍函数式编程以及为这种软件开发方法而设计的一些编程语言。我们首先回顾数学函数的基本思想,因为函数式语言就是以它们为基础的。接下来,介绍函数式编程语言的概念,然后介绍第一种函数式语言 Lisp。接下来的篇幅较长的部分专门介绍 Scheme,包括它的一些原始函数、特殊形式、函数形式以及用 Scheme 编写的一些简单函数示例。接下来,我们简要介绍 Common Lisp、ML、Haskell 和 F#。然后,我们讨论一些命令式语言中包含的对函数式编程的支持。下一节介绍了函数式编程语言的一些应用。最后,我们对函数式语言和命令式语言进行了简要比较。
This chapter introduces functional programming and some of the programming languages that have been designed for this approach to software development. We begin by reviewing the fundamental ideas of mathematical functions, because functional languages are based on them. Next, the idea of a functional programming language is introduced, followed by a look at the first functional language, Lisp. The next, somewhat lengthy section, is devoted to an introduction to Scheme, including some of its primitive functions, special forms, functional forms, and some examples of simple functions written in Scheme. Next, we provide brief introductions to Common Lisp, ML, Haskell, and F#. Then, we discuss support for functional programming that is included in some imperative languages. The following section describes some of the applications of functional programming languages. Finally, we present a brief comparison of functional and imperative languages.
本书前几章主要讨论了命令式编程语言。命令式语言之间的高度相似性部分源于它们设计的一个共同基础:冯·诺依曼体系结构,如第1章 所述。命令式语言可以统称为改进基本模型 Fortran I 的一系列发展。所有语言的设计都是为了高效利用冯·诺依曼体系结构计算机。尽管大多数程序员都接受命令式编程风格,但有些人认为它对底层体系结构的严重依赖是对软件开发替代方法的不必要限制。
Most of the earlier chapters of this book have been concerned primarily with the imperative programming languages. The high degree of similarity among the imperative languages arises in part from one of the common bases of their design: the von Neumann architecture, as discussed in Chapter 1. Imperative languages can be thought of collectively as a progression of developments to improve the basic model, which was Fortran I. All have been designed to make efficient use of von Neumann architecture computers. Although the imperative style of programming has been found acceptable by most programmers, its heavy reliance on the underlying architecture is thought by some to be an unnecessary restriction on the alternative approaches to software development.
语言设计还有其他基础,其中一些更侧重于特定的编程范例或方法,而不是在特定计算机架构上高效执行。然而,到目前为止,只有相对较少的程序是用非命令式语言编写的。
Other bases for language design exist, some of them oriented more to particular programming paradigms or methodologies than to efficient execution on a particular computer architecture. Thus far, however, only a relatively small minority of programs have been written in nonimperative languages.
函数式编程范式是基于数学函数的,它是最重要的非命令式语言的设计基础。函数式编程语言支持这种编程风格。
The functional programming paradigm, which is based on mathematical functions, is the design basis of the most important nonimperative styles of languages. This style of programming is supported by functional programming languages.
1977 年 ACM 图灵奖颁给了 John Backus,以表彰他在 Fortran 开发中做出的贡献。该奖项的每位获奖者都要在正式颁奖时发表演讲,演讲内容随后发表在《ACM 通讯》上。Backus (1978)在他的图灵奖演讲中提出,纯函数式编程语言比命令式语言更好,因为它们编写的程序更易读、更可靠,而且更有可能是正确的。他论点的核心是,纯函数式程序在开发过程中和开发之后都更容易理解,这主要是因为表达式的含义与其上下文无关(纯函数式编程语言的一个特征是表达式和函数都没有副作用)。
The 1977 ACM Turing Award was given to John Backus for his work in the development of Fortran. Each recipient of this award presents a lecture when the award is formally given, and the lecture is subsequently published in the Communications of the ACM. In his Turing Award lecture, Backus (1978) made a case that purely functional programming languages are better than imperative languages because they result in programs that are more readable, more reliable, and more likely to be correct. The crux of his argument was that purely functional programs are easier to understand, both during and after development, largely because the meanings of expressions are independent of their context (one characterizing feature of a pure functional programming language is that neither expressions nor functions have side effects).
在这次演讲中,Backus 提出了一种纯函数式语言 FP(函数式编程),并用它来阐述自己的观点。尽管这种语言没有成功,至少在实现广泛使用方面没有成功,但他的想法激发了对纯函数式编程语言的争论和研究。这里的重点是,一些著名的计算机科学家试图推广函数式编程语言优于传统命令式语言的概念,尽管这些努力显然没有达到他们的目标。然而,在过去十年中,部分原因是由于类型化函数式语言(如 ML、Haskell、OCaml 和 F#)的成熟,人们对函数式编程语言的兴趣和使用有所增加。
In this lecture, Backus proposed a pure functional language, FP ( functional programming), which he used to frame his argument. Although the language did not succeed, at least in terms of achieving widespread use, his idea motivated debate and research on pure functional programming languages. The point here is that some well-known computer scientists have attempted to promote the concept that functional programming languages are superior to the traditional imperative languages, though those efforts have obviously fallen short of their goals. However, over the last decade, prompted in part by the maturing of the typed functional languages, such as ML, Haskell, OCaml, and F#, there has been an increase in the interest in and use of functional programming languages.
命令式语言编写的程序的一个基本特征是它们具有状态,状态在整个执行过程中会发生变化。此状态由程序的变量表示。程序的作者和所有读者都必须了解其变量的用途以及程序的状态在执行过程中如何变化。对于大型程序来说,这是一项艰巨的任务。这是命令式语言编写的程序的一个问题,而纯函数式语言编写的程序则不存在这个问题,因为此类程序既没有变量也没有状态。
One of the fundamental characteristics of programs written in imperative languages is that they have state, which changes throughout the execution process. This state is represented by the program’s variables. The author and all readers of the program must understand the uses of its variables and how the program’s state changes through execution. For a large program, this is a daunting task. This is one problem with programs written in an imperative language that is not present in a program written in a pure functional language, for such programs have neither variables nor state.
Lisp 最初是一种纯函数式语言,但很快就获得了一些重要的命令式特性,从而提高了执行效率。它仍然是最重要的函数式语言,至少从它是唯一一种得到广泛使用的语言的角度来看是如此。它在知识表示、机器学习、智能训练系统和语音建模领域占据主导地位。Common Lisp 是 20 世纪 80 年代早期几种 Lisp 方言的混合体。
Lisp began as a pure functional language but soon acquired some important imperative features that increased its execution efficiency. It is still the most important of the functional languages, at least in the sense that it is the only one that has achieved widespread use. It dominates in the areas of knowledge representation, machine learning, intelligent training systems, and the modeling of speech. Common Lisp is an amalgam of several early 1980s dialects of Lisp.
Scheme 是 Lisp 的一种小型、静态作用域的方言。Scheme 被广泛用于教授函数式编程。一些大学还用它来教授入门编程课程。
Scheme is a small, static-scoped dialect of Lisp. Scheme has been widely used to teach functional programming. It is also used in some universities to teach introductory programming courses.
类型化函数式编程语言(主要是 ML、Haskell、OCaml 和 F#)的发展,大大扩展了使用函数式语言的计算领域。随着这些语言的成熟,它们的实际用途也越来越广泛。它们现在被用于数据库处理、金融建模、统计分析和生物信息学等领域。
The development of the typed functional programming languages, primarily ML, Haskell, OCaml, and F#, has led to a significant expansion of the areas of computing in which functional languages are now used. As these languages have matured, their practical use is growing. They are now being used in areas such as database processing, financial modeling, statistical analysis, and bioinformatics.
本章的一个目标是介绍如何使用 Scheme 的核心进行函数式编程,并有意省略其必要的功能。其中包含足够的关于 Scheme 的材料,以便读者编写一些简单但有趣的程序。如果没有实际的编程经验,很难真正感受到函数式编程,因此强烈建议这样做。
One objective of this chapter is to provide an introduction to functional programming using the core of Scheme, intentionally leaving out its imperative features. Sufficient material on Scheme is included to allow the reader to write some simple but interesting programs. It is difficult to acquire an actual feel for functional programming without some actual programming experience, so that is strongly encouraged.
数学函数是将一个集合(称为定义域集)的成员映射到另一个集合(称为值域集)。函数定义会显式或隐式地指定定义域集和值域集以及映射。映射由表达式描述,在某些情况下也由表描述。函数通常应用于定义域集的特定元素,作为函数的参数。请注意,定义域集可能是多个集合的叉积(反映出可以有多个参数)。函数会生成值域集的一个元素。
A mathematical function is a mapping of members of one set, called the domain set, to another set, called the range set. A function definition specifies the domain and range sets, either explicitly or implicitly, along with the mapping. The mapping is described by an expression or, in some cases, by a table. Functions are often applied to a particular element of the domain set, given as a parameter to the function. Note that the domain set may be the cross product of several sets (reflecting that there can be more than one parameter). A function yields an element of the range set.
数学函数的一个基本特征是,其映射表达式的求值顺序由递归和条件表达式控制,而不是由命令式编程语言编写的程序中常见的排序和迭代重复控制。
One of the fundamental characteristics of mathematical functions is that the evaluation order of their mapping expressions is controlled by recursion and conditional expressions, rather than by the sequencing and iterative repetition that are common to programs written in the imperative programming languages.
数学函数的另一个重要特征是,由于它们没有副作用并且不能依赖于任何外部值,因此它们总是将域中的特定元素映射到范围中的相同元素。但是,命令式语言中的子程序可能依赖于多个非局部或全局变量的当前值。这使得很难静态地确定子程序将产生什么值以及它对特定执行会产生什么副作用。
Another important characteristic of mathematical functions is that because they have no side effects and cannot depend on any external values, they always map a particular element of the domain to the same element of the range. However, a subprogram in an imperative language may depend on the current values of several nonlocal or global variables. This makes it difficult to determine statically what values the subprogram will produce and what side effects it will have on a particular execution.
在数学中,不存在模拟内存位置的变量。命令式编程语言中函数中的局部变量维持函数的状态。计算是通过评估赋值语句中改变程序状态的表达式来完成的。在数学中,没有函数状态的概念。
In mathematics, there is no such thing as a variable that models a memory location. Local variables in functions in imperative programming languages maintain the state of the function. Computation is accomplished by evaluating expressions in assignment statements that change the state of the program. In mathematics, there is no concept of the state of a function.
数学函数将其参数映射到一个值(或多个值),而不是指定对内存中的值进行一系列操作来产生一个值。
A mathematical function maps its parameter(s) to a value (or values), rather than specifying a sequence of operations on values in memory to produce a value.
函数定义通常写为函数名,后面跟着括号中的参数列表,后面跟着映射表达式。例如,
Function definitions are often written as a function name, followed by a list of parameters in parentheses, followed by the mapping expression. For example,
其中 x 是实数。
where x is a real number.
在这个定义中,定义域和值域集都是实数。符号 用来表示“定义为”。参数 x 可以表示域集的任何成员,但在函数表达式求值期间,它固定为表示一个特定元素。这是数学函数的参数与命令式语言中的变量不同的一个方面。
In this definition, the domain and range sets are the real numbers. The symbol is used to mean “is defined as.” The parameter x can represent any member of the domain set, but it is fixed to represent one specific element during evaluation of the function expression. This is one way the parameters of mathematical functions differ from the variables in imperative languages.
函数应用是通过将函数名称与域集的特定元素配对来指定的。范围元素是通过评估函数映射表达式获得的,其中域元素替换参数的出现。再次强调,重要的是要注意,在评估期间,函数的映射不包含未绑定的参数,其中绑定参数是特定值的名称。参数的每次出现都与域集中的值绑定,并且在评估期间为常量。例如,考虑以下对 cube(x) 的评估:
Function applications are specified by pairing the function name with a particular element of the domain set. The range element is obtained by evaluating the function-mapping expression with the domain element substituted for the occurrences of the parameter. Once again, it is important to note that during evaluation, the mapping of a function contains no unbound parameters, where a bound parameter is a name for a particular value. Every occurrence of a parameter is bound to a value from the domain set and is a constant during evaluation. For example, consider the following evaluation of cube(x):
计算过程中,参数 x 绑定为 2.0,没有未绑定的参数。此外,计算过程中 x 是一个常量(其值不能改变)。
The parameter x is bound to 2.0 during the evaluation and there are no unbound parameters. Furthermore, x is a constant (its value cannot be changed) during the evaluation.
早期的函数理论工作将定义函数的任务与命名函数的任务分开。由Alonzo Church (1941)设计的 Lambda 表示法提供了一种定义无名函数的方法。Lambda表达式指定函数的参数和映射。Lambda 表达式是函数本身,它是无名的。例如,考虑以下 lambda 表达式:
Early theoretical work on functions separated the task of defining a function from that of naming the function. Lambda notation, as devised by Alonzo Church (1941), provides a method for defining nameless functions. A lambda expression specifies the parameters and the mapping of a function. The lambda expression is the function itself, which is nameless. For example, consider the following lambda expression:
Church 使用 lambda 表达式定义了一个正式的计算模型(一个用于函数定义、函数应用和递归的正式系统)。这被称为lambda 演算。lambda 演算可以是类型化的,也可以是非类型的。非类型化的 lambda 演算为函数式编程语言提供了灵感。
Church defined a formal computation model (a formal system for function definition, function application, and recursion) using lambda expressions. This is called lambda calculus. Lambda calculus can be either typed or untyped. Untyped lambda calculus serves as the inspiration for the functional programming languages.
如前所述,在求值之前,参数代表域集的任何成员,但在求值期间,它绑定到特定成员。当针对给定参数求值 lambda 表达式时,该表达式被称为应用于该参数。这种应用的机制与任何函数求值相同。示例 lambda 表达式的应用表示如下例所示:
As stated earlier, before evaluation a parameter represents any member of the domain set, but during evaluation it is bound to a particular member. When a lambda expression is evaluated for a given parameter, the expression is said to be applied to that parameter. The mechanics of such an application are the same as for any function evaluation. Application of the example lambda expression is denoted as in the following example:
结果是值 8。
which results in the value 8.
与其他函数定义一样,Lambda 表达式可以有多个参数。
Lambda expressions, like other function definitions, can have more than one parameter.
高阶函数或函数形式是指以一个或多个函数为参数或以函数为结果(或两者兼有)的函数。一种常见的函数形式是函数组合,它具有两个函数参数并产生一个函数,其值是第一个实际参数函数应用于第二个实际参数的结果。函数组合写成表达式,使用 º 作为运算符,如下所示
A higher-order function, or functional form, is one that either takes one or more functions as parameters or yields a function as its result, or both. One common kind of functional form is function composition, which has two functional parameters and yields a function whose value is the first actual parameter function applied to the result of the second. Function composition is written as an expression, using º as an operator, as in
例如,如果
For example, if
那么h定义为
then h is defined as
应用于全部是一种以单个函数为参数的函数形式。1如果应用于参数列表,则应用于全部会将其函数参数应用于列表参数中的每个值,并将结果收集到列表或序列中。应用于全部表示为 请考虑以下示例:
Apply-to-all is a functional form that takes a single function as a parameter.1 If applied to a list of parameters, apply-to-all applies its functional parameter to each of the values in the list parameter and collects the results in a list or sequence. Apply-to-all is denoted by Consider the following example:
让
Let
然后
then
收益率(4、9、16)
yields (4, 9, 16)
还有其他函数形式,但这两个例子说明了所有这些函数形式的基本特征。
There are other functional forms, but these two examples illustrate the basic characteristics of all of them.
函数式编程语言的设计目标是尽可能地模仿数学函数。这导致了一种与命令式语言所用方法根本不同的解决问题的方法。在命令式语言中,表达式被求值,结果存储在内存位置中,该位置在程序中表示为变量。这就是赋值语句的目的。这种对内存单元的必要关注(其值代表程序的状态)导致了一种相对低级的编程方法。
The objective of the design of a functional programming language is to mimic mathematical functions to the greatest extent possible. This results in an approach to problem solving that is fundamentally different from approaches used with imperative languages. In an imperative language, an expression is evaluated and the result is stored in a memory location, which is represented as a variable in a program. This is the purpose of assignment statements. This necessary attention to memory cells, whose values represent the state of the program, results in a relatively low-level programming methodology.
汇编语言程序通常还必须存储表达式部分求值的结果。例如,要求值
A program in an assembly language often must also store the results of partial evaluations of expressions. For example, to evaluate
的价值 首先计算。然后必须存储该值,同时 被求值。编译器负责存储高级语言中表达式求值的中间结果。中间结果的存储仍然是必需的,但细节对程序员来说是隐藏的。
the value of is computed first. That value must then be stored while is evaluated. The compiler handles the storage of intermediate results of expression evaluations in high-level languages. The storage of intermediate results is still required, but the details are hidden from the programmer.
纯函数式编程语言不使用变量或赋值语句,从而使程序员无需担心程序的存储单元或状态。没有变量,迭代构造就不可能实现,因为它们由变量控制。必须用递归而不是迭代来指定重复。程序是函数定义和函数应用规范,执行包括评估函数应用。没有变量,纯函数式程序的执行在操作语义和指称语义意义上没有状态。当给定相同的参数时,函数的执行总是产生相同的结果。此特性称为引用 透明性。它使纯函数式语言的语义比命令式语言(以及包含命令式特性的函数式语言)的语义简单得多。它还使测试更容易,因为可以单独测试每个函数,而不必担心其上下文。
A purely functional programming language does not use variables or assignment statements, thus freeing the programmer from concerns related to the memory cells, or state, of the program. Without variables, iterative constructs are not possible, for they are controlled by variables. Repetition must be specified with recursion rather than with iteration. Programs are function definitions and function application specifications, and executions consist of evaluating function applications. Without variables, the execution of a purely functional program has no state in the sense of operational and denotational semantics. The execution of a function always produces the same result when given the same parameters. This characteristic is called referential transparency. It makes the semantics of purely functional languages far simpler than the semantics of the imperative languages (and the functional languages that include imperative features). It also makes testing easier, because each function can be tested separately, without any concern for its context.
函数式语言提供了一组原始函数、一组用于从这些原始函数构造复杂函数的函数形式、函数应用操作以及用于表示数据的一个或多个结构。这些结构用于表示函数计算的参数和值。如果函数式语言设计得当,则只需要相对较少的原始函数。
A functional language provides a set of primitive functions, a set of functional forms to construct complex functions from those primitive functions, a function application operation, and some structure or structures for representing data. These structures are used to represent the parameters and values computed by functions. If a functional language is well designed, it requires only a relatively small number of primitive functions.
正如我们在前面的章节中看到的,第一个函数式编程语言 Lisp 的数据和代码语法形式与命令式语言截然不同。然而,后来设计的许多函数式语言的代码语法与命令式语言类似。
As we have seen in earlier chapters, the first functional programming language, Lisp, uses a syntactic form for both data and code that is very different from that of the imperative languages. However, many functional languages designed later use syntax for their code that is similar to that of the imperative languages.
虽然也有一些纯函数式语言,例如 Haskell,但大多数被称为函数式的语言都包含一些命令式特性,例如可变变量和充当赋值语句的构造。
Although there are a few purely functional languages, for example, Haskell, most of the languages that are called functional include some imperative features, for example, mutable variables and constructs that act as assignment statements.
一些起源于函数式语言的概念和结构,例如惰性求值和匿名子程序,现在已经进入一些被认为是命令式的语言中。
Some concepts and constructs that originated in functional languages, such as lazy evaluation and anonymous subprograms, have now found their way into some languages that are considered imperative.
虽然早期的函数式语言通常是用解释器来实现的,但现在许多用函数式编程语言编写的程序都是编译型的。
Although early functional languages were often implemented with interpreters, many programs written in functional programming languages are now compiled.
已经开发了许多函数式编程语言。最古老和使用最广泛的是 Lisp(或其后代之一),它由约翰·麦卡锡于 1959 年在麻省理工学院开发。通过 Lisp 学习函数式语言有点类似于通过 Fortran 学习命令式语言:Lisp 是第一种函数式语言,但尽管它已经稳步发展了半个世纪,但它不再代表函数式语言的最新设计理念。此外,除了第一个版本外,所有 Lisp 方言都包含命令式语言功能,例如命令式变量、赋值语句和迭代。(命令式变量用于命名存储单元,其值在程序执行期间可以多次更改。)尽管如此,并且形式有些奇怪,原始 Lisp 的后代很好地代表了函数式编程的基本概念,因此值得研究。
Many functional programming languages have been developed. The oldest and most widely used is Lisp (or one of its descendants), which was developed by John McCarthy at MIT in 1959. Studying functional languages through Lisp is somewhat akin to studying the imperative languages through Fortran: Lisp was the first functional language, but although it has steadily evolved for half a century, it no longer represents the latest design concepts for functional languages. In addition, with the exception of the first version, all Lisp dialects include imperative-language features, such as imperative-style variables, assignment statements, and iteration. (Imperative-style variables are used to name memory cells, whose values can change many times during program execution.) Despite this and their somewhat odd form, the descendants of the original Lisp represent well the fundamental concepts of functional programming and are therefore worthy of study.
原始 Lisp 中只有两类数据对象:原子和列表。列表元素是成对的,其中第一部分是元素的数据,它是指向原子或嵌套列表的指针。对的第二部分可以是指向原子的指针、指向另一个元素的指针或特殊值 nil。元素通过第二部分在列表中链接在一起。原子和列表不是命令式语言具有类型的类型。事实上,原始 Lisp 是一种无类型语言。原子要么是符号(以标识符的形式),要么是数字文字。
There were only two categories of data objects in the original Lisp: atoms and lists. List elements are pairs, where the first part is the data of the element, which is a pointer to either an atom or a nested list. The second part of a pair can be a pointer to an atom, a pointer to another element, or a special value, nil. Elements are linked together in lists with the second parts. Atoms and lists are not types in the sense that imperative languages have types. In fact, the original Lisp was a typeless language. Atoms are either symbols, in the form of identifiers, or numeric literals.
回想一下第2章 ,Lisp 最初使用列表作为其数据结构,因为它们被认为是列表处理的重要组成部分。然而,随着 Lisp 的最终发展,它很少需要在列表开头以外的位置进行插入和删除等一般列表操作。
Recall from Chapter 2, that Lisp originally used lists as its data structure because they were thought to be an essential part of list processing. As it eventually developed, however, Lisp rarely requires the general list operations of insertion and deletion at positions other than the beginning of a list.
在 Lisp 中,列表通过用括号分隔元素来指定。简单列表的元素仅限于原子,例如
Lists are specified in Lisp by delimiting their elements with parentheses. The elements of simple lists are restricted to atoms, as in
(A B C D)(A B C D)
嵌套列表结构也用括号指定。例如,列表
Nested list structures are also specified by parentheses. For example, the list
(A (B C) D (E (F G)))(A (B C) D (E (F G)))
是一个包含四个元素的列表。第一个是原子A;第二个是子列表(B C);第三个是原子D;第四个是子列表(E (F G)),其第二个元素是子列表(F G)。
is a list of four elements. The first is the atom A; the second is the sublist (B C); the third is the atom D; the fourth is the sublist (E (F G)), which has as its second element the sublist (F G).
在 Lisp 实现中,列表通常以链表结构存储,其中每个节点有两个指针,一个用于引用节点的数据,另一个用于形成链表。列表由指向其第一个元素的指针引用。
In a Lisp implementation, a list is usually stored as linked list structure in which each node has two pointers, one to reference the data of the node and the other to form the linked list. A list is referenced by a pointer to its first element.
图 15.1显示了两个示例列表的内部表示。请注意,列表的元素是水平显示的。最后一个列表的元素没有后继,因此其链接为 nil。子列表具有相同的结构。
The internal representations of our two example lists are shown in Figure 15.1. Note that the elements of a list are shown horizontally. The last element of a list has no successor, so its link is nil. Sublists are shown with the same structure.
Lisp 设计的初衷是让程序的符号尽可能接近 Fortran 符号,并在必要时进行补充。这种符号被称为 M 符号,即元符号。将有一个编译器将用 M 符号编写的程序转换为适用于 IBM 704 的语义等效的机器代码程序。
The original intent of Lisp’s design was to have a notation for programs that would be as close to Fortran’s as possible, with additions when necessary. This notation was called M-notation, for meta-notation. There was to be a compiler that would translate programs written in M-notation into semantically equivalent machine code programs for the IBM 704.
在 Lisp 开发的早期,McCarthy 写了一篇论文,推广列表处理作为通用符号处理的方法。McCarthy 认为列表处理可用于研究可计算性,当时通常使用基于命令式计算模型的图灵机来研究可计算性。McCarthy 认为符号列表的函数处理是一种比图灵机更自然的计算模型,图灵机对磁带上写入的符号进行操作,这些符号代表状态。计算研究的共同要求之一是必须能够证明正在使用的任何计算模型的整个类的某些可计算性特征。在图灵机模型的情况下,可以构建一个可以模仿任何其他图灵机操作的通用图灵机。从这个概念产生了构建一个可以评估 Lisp 中任何其他函数的通用 Lisp 函数的想法。
Early in the development of Lisp, McCarthy wrote a paper to promote list processing as an approach to general symbolic processing. McCarthy believed that list processing could be used to study computability, that at the time was usually studied using Turing machines, which are based on the imperative model of computation. McCarthy thought that the functional processing of symbolic lists was a more natural model of computation than Turing machines, that operated on symbols written on tapes, which represented state. One of the common requirements of the study of computation is that one must be able to prove certain computability characteristics of the whole class of whatever model of computation is being used. In the case of the Turing machine model, one can construct a universal Turing machine that can mimic the operations of any other Turing machine. From this concept came the idea of constructing a universal Lisp function that could evaluate any other function in Lisp.
通用 Lisp 函数的首要要求是允许以与数据相同的方式表达函数的符号。第15.4.1节 中描述的带括号的列表符号已被用于 Lisp 数据,因此决定发明函数定义和函数调用的约定,这些约定也可以用列表符号表示。函数调用以前缀列表形式指定,最初称为Cambridge Polish 2,如下所示:
The first requirement for the universal Lisp function was a notation that allowed functions to be expressed in the same way data was expressed. The parenthesized list notation described in Section 15.4.1 had already been adopted for Lisp data, so it was decided to invent conventions for function definitions and function calls that could also be expressed in list notation. Function calls were specified in a prefix list form originally called Cambridge Polish,2 as in the following:
例如,如果
是一个接受两个或更多数字参数的函数,以下两个表达式分别计算为12和20:
For example, if
is a function that takes two or more numeric parameters, the following two expressions evaluate to 12 and 20, respectively:
(+ 5 7)
(+ 3 4 7 6)
(+ 5 7)
(+ 3 4 7 6)
选择了第15.2.1节 中描述的 lambda 表示法来指定函数定义。但是,必须对其进行修改,以允许将函数绑定到名称,以便函数可以被其他函数和自身引用。此名称绑定由一个由函数名称和包含 lambda 表达式的列表组成的列表指定,如下所示
The lambda notation described in Section 15.2.1 was chosen to specify function definitions. It had to be modified, however, to allow the binding of functions to names so that functions could be referenced by other functions and by themselves. This name binding was specified by a list consisting of the function name and a list containing the lambda expression, as in
(函数名称)(LAMBDA (
)表达))
(function_name (LAMBDA (
) expression))
如果您以前没有接触过函数式编程,那么考虑无名函数似乎很奇怪。但是,无名函数有时在函数式编程(以及数学和命令式编程)中很有用。例如,考虑一个函数,其操作是生成一个函数以立即应用于参数列表。生成的函数不需要名称,因为它仅在构造时应用。第15.5.14节 中给出了这样的例子。
If you have had no prior exposure to functional programming, it may seem odd to even consider a nameless function. However, nameless functions are sometimes useful in functional programming (as well as in mathematics and imperative programming). For example, consider a function whose action is to produce a function for immediate application to a parameter list. The produced function has no need for a name, for it is applied only at the point of its construction. Such an example is given in Section 15.5.14.
用这种新符号指定的 Lisp 函数称为 S 表达式,即符号表达式。最终,所有 Lisp 结构(包括数据和代码)都称为 S 表达式。S 表达式可以是列表或原子。我们通常将 S 表达式简称为表达式。
Lisp functions specified in this new notation were called S-expressions, for symbolic expressions. Eventually, all Lisp structures, both data and code, were called S-expressions. An S-expression can be either a list or an atom. We will usually refer to S-expressions simply as expressions.
McCarthy 成功开发了一个可以求任何其他函数值的通用函数。这个函数被命名EVAL,并且本身就是一个表达式的形式。开发 Lisp 的 AI 项目中的两个人 Stephen B. Russell 和 Daniel J. Edwards 注意到的实现EVAL可以用作 Lisp 解释器,他们迅速构建了这样一个实现(McCarthy 等人,1965 年)。
McCarthy successfully developed a universal function that could evaluate any other function. This function was named EVAL and was itself in the form of an expression. Two of the people in the AI Project, which was developing Lisp, Stephen B. Russell and Daniel J. Edwards, noticed that an implementation of EVAL could serve as a Lisp interpreter, and they promptly constructed such an implementation (McCarthy et al., 1965).
这种快速、简单且出乎意料的实现有几个重要结果。首先,所有早期的 Lisp 实现都是复制的EVAL,因此都是解释性的。其次,M 符号的定义(Lisp 的计划编程符号)从未完成或实现,因此 S 表达式成为 Lisp 的唯一符号。对数据和代码使用相同的符号会产生重要后果,其中之一将在15.5.14节 ()中讨论。第三,许多原始语言设计实际上被冻结,保留了语言中的某些奇怪特性,例如条件表达式形式以及对空列表和逻辑假的使用。
There were several important results of this quick, easy, and unexpected implementation. First, all early Lisp implementations copied EVAL and were therefore interpretive. Second, the definition of M-notation, which was the planned programming notation for Lisp, was never completed or implemented, so S-expressions became Lisp’s only notation. The use of the same notation for data and code has important consequences, one of which will be discussed in Section 15.5.14. Third, much of the original language design was effectively frozen, keeping certain odd features in the language, such as the conditional expression form and the use of () for both the empty list and logical false.
早期 Lisp 系统的另一个特性显然是偶然的,那就是使用动态作用域。函数在其调用者的环境中进行评估。当时没有人对作用域了解很多,而且可能没有考虑过这个选择。1975 年之前,大多数 Lisp 方言都使用动态作用域。当代方言要么使用静态作用域,要么允许程序员在静态和动态作用域之间进行选择。
Another feature of early Lisp systems that was apparently accidental was the use of dynamic scoping. Functions were evaluated in the environments of their callers. No one at the time knew much about scoping, and there may have been little thought given to the choice. Dynamic scoping was used for most dialects of Lisp before 1975. Contemporary dialects either use static scoping or allow the programmer to choose between static and dynamic scoping.
可以用 Lisp 编写 Lisp 的解释器。这样的解释器不是一个大型程序,它用 Lisp 描述了 Lisp 的操作语义。这是该语言语义简单性的生动证据。
An interpreter for Lisp can be written in Lisp. Such an interpreter, which is not a large program, describes the operational semantics of Lisp, in Lisp. This is vivid evidence of the semantic simplicity of the language.
在本节中,我们描述 Scheme 的核心部分(Dybvig,2011)。我们之所以选择 Scheme,是因为它相对简单,在高校中很受欢迎,并且 Scheme 解释器可供各种计算机轻松使用(且免费)。本节中描述的 Scheme 版本是 Scheme 4。请注意,本节仅涵盖 Scheme 的一小部分,并且不包含 Scheme 的任何必需功能。
In this section, we describe the core part of Scheme (Dybvig, 2011). We have chosen Scheme because it is relatively simple, it is popular in colleges and universities, and Scheme interpreters are readily available (and free) for a wide variety of computers. The version of Scheme described in this section is Scheme 4. Note that this section covers only a small part of Scheme, and it includes none of Scheme’s imperative features.
Scheme 语言是 Lisp 的一种方言,由麻省理工学院于 20 世纪 70 年代中期开发而成(Sussman 和 Steele,1975 年)。该语言的特点是规模小、仅使用静态作用域,并将函数视为一等实体。作为一等实体,Scheme 函数可以是表达式的值、列表的元素、作为参数传递以及从函数返回。早期版本的 Lisp 并未提供所有这些功能。
The Scheme language, which is a dialect of Lisp, was developed at MIT in the mid-1970s (Sussman and Steele, 1975). It is characterized by its small size, its exclusive use of static scoping, and its treatment of functions as first-class entities. As first-class entities, Scheme functions can be the values of expressions, elements of lists, passed as parameters, and returned from functions. Early versions of Lisp did not provide all of these capabilities.
Scheme 本质上是一种无类型的小型语言,具有简单的语法和语义,非常适合教育应用,例如函数式编程课程,也适合编程的一般介绍。
As an essentially typeless small language with simple syntax and semantics, Scheme is well suited to educational applications, such as courses in functional programming, and also to general introductions to programming.
以下章节中的大部分 Scheme 代码只需进行少量修改即可转换为有效的 Lisp 代码。
Most of the Scheme code in the following sections would require only minor modifications to be converted to valid Lisp code.
交互模式下的 Scheme 解释器是一个无限的读取-求值-打印循环(通常缩写为 REPL)。它重复读取用户输入的表达式(以列表的形式),解释该表达式并显示结果值。Ruby 和 Python 也使用这种形式的解释器。表达式由函数解释EVAL。文字计算其自身。因此,如果您向解释器输入一个数字,它只会显示该数字。调用原始函数的表达式按以下方式求值:首先,不按特定顺序求值每个参数表达式。然后,将原始函数应用于参数值,并显示结果值。
A Scheme interpreter in interactive mode is an infinite read-evaluate-print loop (often abbreviated as REPL). It repeatedly reads an expression typed by the user (in the form of a list), interprets the expression, and displays the resulting value. This form of interpreter is also used by Ruby and Python. Expressions are interpreted by the function EVAL. Literals evaluate to themselves. So, if you type a number to the interpreter, it simply displays the number. Expressions that are calls to primitive functions are evaluated in the following way: First, each of the parameter expressions is evaluated, in no particular order. Then, the primitive function is applied to the parameter values, and the resulting value is displayed.
当然,存储在文件中的 Scheme 程序也可以被加载和解释。
Of course, Scheme programs that are stored in files can be loaded and interpreted.
Scheme 中的注释是任意行中分号后面的任意文本。
Comments in Scheme are any text following a semicolon on any line.
Scheme 包含用于基本算术运算的原始函数。这些函数包括+、-、*和/,分别用于加、减、乘、除。*和+可以有零个或多个参数。如果*没有给定任何参数,则返回1;如果+没有给定任何参数,则返回0。+将其所有参数相加。*将所有参数相乘其参数一起。/并且-可以有两个或多个参数。在减法的情况下,除第一个参数之外的所有参数都从第一个参数中减去。除法类似于减法。以下是一些示例:
Scheme includes primitive functions for the basic arithmetic operations. These are +, -, *, and /, for add, subtract, multiply, and divide. * and + can have zero or more parameters. If * is given no parameters, it returns 1; if + is given no parameters, it returns 0. + adds all of its parameters together. * multiplies all its parameters together. / and - can have two or more parameters. In the case of subtraction, all but the first parameter are subtracted from the first. Division is similar to subtraction. Some examples are:
Scheme 中还有大量其他数字函数,其中包括MODULO、ROUND、MAX、MIN、LOG和。如果参数值不是负数,SIN则返回其数字参数的平方根。如果参数为负数,则返回一个复数。SQRTSQRTSQRT
There are a large number of other numeric functions in Scheme, among them MODULO, ROUND, MAX, MIN, LOG, SIN, and SQRT. SQRT returns the square root of its numeric parameter, if the parameter’s value is not negative. If the parameter is negative, SQRT yields a complex number.
在 Scheme 中,请注意我们对所有保留字和预定义函数都使用大写字母。该语言的官方定义规定,这些字母没有大小写之分。但是,某些实现(例如 DrRacket 的教学语言)要求保留字和预定义函数使用小写字母。
In Scheme, note that we use uppercase letters for all reserved words and predefined functions. The official definition of the language specifies that there is no distinction between uppercase and lowercase in these. However, some implementations, for example DrRacket’s teaching languages, require lowercase for reserved words and predefined functions.
如果函数的参数数量是固定的,例如SQRT,则调用时的参数数量必须与该数量匹配。如果不匹配,解释器将产生错误消息。
If a function has a fixed number of parameters, such as SQRT, the number of parameters in the call must match that number. If not, the interpreter will produce an error message.
Scheme 程序是函数定义的集合。因此,知道如何定义这些函数是编写最简单程序的先决条件。在 Scheme 中,无名函数实际上包含单词LAMBDA,称为lambda 表达式。例如,
A Scheme program is a collection of function definitions. Consequently, knowing how to define these functions is a prerequisite to writing the simplest program. In Scheme, a nameless function actually includes the word LAMBDA, and is called a lambda expression. For example,
(LAMBDA (x) (* x x))(LAMBDA (x) (* x x))
是一个无名函数,它返回给定数字参数的平方。此函数的应用方式与命名函数相同:将其放在包含实际参数的列表的开头。例如,以下表达式得出49:
is a nameless function that returns the square of its given numeric parameter. This function can be applied in the same way that named functions are: by placing it in the beginning of a list that contains the actual parameters. For example, the following expression yields 49:
((LAMBDA (x) (* x x)) 7)((LAMBDA (x) (* x x)) 7)
在这个表达式中,x被称为lambda 表达式中的绑定变量x。在计算这个表达式的过程中,被绑定到7。在开始计算 lambda 表达式时,绑定变量被绑定到实际参数值之后,在表达式中永远不会改变。
In this expression, x is called a bound variable within the lambda expression. During the evaluation of this expression, x is bound to 7. A bound variable never changes in the expression after being bound to an actual parameter value at the time evaluation of the lambda expression begins.
Lambda 表达式可以有任意数量的参数。例如,我们可以有以下内容:
Lambda expressions can have any number of parameters. For example, we could have the following:
(LAMBDA (a b c x) (+ (* a x x) (* b x) c))(LAMBDA (a b c x) (+ (* a x x) (* b x) c))
Scheme 特殊形式函数DEFINE满足 Scheme 编程的两个基本需求:将名称绑定到值以及将名称绑定到 lambda 表达式。DEFINE将名称绑定到值的形式可能使其看起来DEFINE可用于创建命令式语言风格的变量。但是,这些名称绑定会创建命名值,而不是变量。
The Scheme special form function DEFINE serves two fundamental needs of Scheme programming: to bind a name to a value and to bind a name to a lambda expression. The form of DEFINE that binds a name to a value may make it appear that DEFINE can be used to create imperative language-style variables. However, these name bindings create named values, not variables.
DEFINE之所以被称为特殊形式,是因为它被解释的EVAL方式与算术函数等普通原语不同,我们很快就会看到。
DEFINE is called a special form because it is interpreted (by EVAL) in a different way than the normal primitives like the arithmetic functions, as we shall soon see.
最简单的形式DEFINE是将名称绑定到表达式的值。这种形式是
The simplest form of DEFINE is one used to bind a name to the value of an expression. This form is
(DEFINE符号表达)
(DEFINE symbol expression)
例如,
For example,
(DEFINE pi 3.14159)
(DEFINE two_pi (* 2 pi))
(DEFINE pi 3.14159)
(DEFINE two_pi (* 2 pi))
如果将这两个表达式输入到 Scheme 解释器中,然后pi输入 ,3.14159则将显示数字;two_pi输入 时,6.28318将显示 。在这两种情况下,显示的数字可能比此处显示的位数更多。
If these two expressions have been typed to the Scheme interpreter and then pi is typed, the number 3.14159 will be displayed; when two_pi is typed, 6.28318 will be displayed. In both cases, the displayed numbers may have more digits than are shown here.
这种形式DEFINE类似于命令式语言中命名常量的声明。例如,在 Java 中,上述定义名称的等价形式如下:
This form of DEFINE is analogous to a declaration of a named constant in an imperative language. For example, in Java, the equivalents to the above defined names are as follows:
final float PI = 3.14159;
final float TWO_PI = 2.0 * PI;
final float PI = 3.14159;
final float TWO_PI = 2.0 * PI;
Scheme 中的名称可以由字母、数字和括号以外的特殊字符组成;它们不区分大小写,并且不能以数字开头。
Names in Scheme can consist of letters, digits, and special characters except parentheses; they are case insensitive and must not begin with a digit.
函数的第二种用途DEFINE是将 lambda 表达式绑定到名称。在这种情况下,lambda 表达式通过删除单词 来缩写LAMBDA。要将名称绑定到 lambda 表达式,DEFINE需要两个列表作为参数。第一个参数是函数调用的原型,函数名称后跟形式参数,一起放在一个列表中。第二个列表包含要将名称绑定到的表达式。这种 的一般形式DEFINE为3
The second use of the DEFINE function is to bind a lambda expression to a name. In this case, the lambda expression is abbreviated by removing the word LAMBDA. To bind a name to a lambda expression, DEFINE takes two lists as parameters. The first parameter is the prototype of a function call, with the function name followed by the formal parameters, together in a list. The second list contains an expression to which the name is to be bound. The general form of such a DEFINE is3
(DEFINE (函数名称 参数) (表达式))
(DEFINE (function_name parameters) (expression))
当然,这种形式DEFINE是命名函数的定义。
Of course, this form of DEFINE is the definition of a named function.
以下示例调用将DEFINE名称绑定square到采用一个参数的函数表达式:
The following example call to DEFINE binds the name square to a functional expression that takes one parameter:
(DEFINE (square number) (* number number))(DEFINE (square number) (* number number))
解释器评估此函数后,可以使用它,如下所示
After the interpreter evaluates this function, it can be used, as in
(square 5)(square 5)
显示25。
which displays 25.
为了说明原始函数和特殊形式之间的区别DEFINE,请考虑以下内容:
To illustrate the difference between primitive functions and the DEFINE special form, consider the following:
(DEFINE x 10)(DEFINE x 10)
如果DEFINE是一个原始函数,EVAL则 对该表达式的第一个操作将是评估 的两个参数DEFINE。如果x尚未绑定到值,则会出现错误。此外,如果x已经定义,也会出现错误,因为这DEFINE将尝试重新定义x,这是非法的。请记住,x是值的名称;它不是命令式意义上的变量。
If DEFINE were a primitive function, EVAL’s first action on this expression would be to evaluate the two parameters of DEFINE. If x were not already bound to a value, this would be an error. Furthermore, if x were already defined, it would also be an error, because this DEFINE would attempt to redefine x, which is illegal. Remember, x is the name of a value; it is not a variable in the imperative sense.
以下是另一个函数示例。它根据给定的直角三角形其他两条边的长度计算斜边(最长边)的长度。
Following is another example of a function. It computes the length of the hypotenuse (the longest side) of a right triangle, given the lengths of the two other sides.
(DEFINE (hypotenuse side1 side2)
(SQRT(+(square side1)(square side2)))
)
(DEFINE (hypotenuse side1 side2)
(SQRT(+(square side1)(square side2)))
)
请注意,hypotenuse使用square先前定义的。
Notice that hypotenuse uses square, which was defined previously.
EVALScheme 包含一些简单的输出函数,但是当与交互式解释器一起使用时,Scheme 程序的大多数输出都是解释器的正常输出,显示应用于顶级函数的结果。
Scheme includes a few simple output functions, but when used with the interactive interpreter, most output from Scheme programs is the normal output from the interpreter, displaying the results of applying EVAL to top-level functions.
请注意,显式输入和输出不是纯函数式编程模型的一部分,因为输入操作会改变程序状态,而输出操作会产生副作用。这两者都不能成为纯函数式语言的一部分。因此,本章不介绍 Scheme 的显式输入或输出函数。
Note that explicit input and output are not part of the pure functional programming model, because input operations change the program state and output operations have side effects. Neither of these can be part of a pure functional language. Therefore, this chapter does not describe the explicit input or output functions of Scheme.
谓词函数返回布尔值(表示真或假)。Scheme 包含一组用于数值数据的谓词函数。其中包括:
A predicate function is one that returns a Boolean value (some representation of either true or false). Scheme includes a collection of predicate functions for numeric data. Among them are the following:
请注意,所有以单词作为名称的预定义谓词函数的名称都以问号结尾。在 Scheme 中,两个布尔值是#Tand #F(或#tand #f),尽管 Scheme 预定义谓词函数返回空列表,()表示 false。
Notice that the names for all predefined predicate functions that have words for names end with question marks. In Scheme, the two Boolean values are #T and #F (or #t and #f), although the Scheme predefined predicate functions return the empty list, (), for false.
当列表被解释为布尔值时,任何非空列表的计算结果都为真;空列表的计算结果为假。这类似于 C 语言中整数作为布尔值的解释;零的计算结果为假,任何非零值都计算结果为真。
When a list is interpreted as a Boolean, any nonempty list evaluates to true; the empty list evaluates to false. This is similar to the interpretation of integers in C as Boolean values; zero evaluates to false and any nonzero value evaluates to true.
为了便于阅读,本章中的所有示例谓词函数都返回#F,而不是()。
In the interest of readability, all of our example predicate functions in this chapter return #F, rather than ().
该NOT函数用于反转布尔表达式的逻辑。
The NOT function is used to invert the logic of a Boolean expression.
Scheme 使用三种不同的构造来实现控制流:一种类似于命令式语言的选择构造,另外两种基于数学函数中使用的求值控制。
Scheme uses three different constructs for control flow: one similar to the selection construct of the imperative languages and two based on the evaluation control used in mathematical functions.
Scheme 的双向选择器函数名为IF,有三个参数:一个谓词表达式、一个 then 表达式和一个 else 表达式。对 的调用IF形式为
The Scheme two-way selector function, named IF, has three parameters: a predicate expression, a then expression, and an else expression. A call to IF has the form
(IF谓词 then_expression else_expression)
(IF predicate then_expression else_expression)
例如,
For example,
(DEFINE (factorial n)
(IF (<= n 1)
1
(* n (factorial (- n 1)))
))
(DEFINE (factorial n)
(IF (<= n 1)
1
(* n (factorial (- n 1)))
))
回想一下,第8章 讨论了 Scheme 的多重选择 COND 。以下是一个使用 的简单函数的示例COND:
Recall that the multiple selection of Scheme, COND, was discussed in Chapter 8. Following is an example of a simple function that uses COND:
(DEFINE (leap? year)
(COND
((ZERO? (MODULO year 400)) #T)
((ZERO? (MODULO year 100)) #F)
(ELSE (ZERO? (MODULO year 4)))
))
(DEFINE (leap? year)
(COND
((ZERO? (MODULO year 400)) #T)
((ZERO? (MODULO year 100)) #F)
(ELSE (ZERO? (MODULO year 4)))
))
以下小节包含有关使用的更多示例COND。
The following subsections contain additional examples of the use of COND.
第三个 Scheme 控制机制是递归,它用于指定重复,就像在数学中一样。第 15.5.10节 中的大多数示例函数都使用递归。
The third Scheme control mechanism is recursion, which is used, as in mathematics, to specify repetition. Most of the example functions in Section 15.5.10 use recursion.
基于 Lisp 的编程语言最常见的用途之一是列表处理。本节介绍用于处理列表的 Scheme 函数。回想一下,第6章 简要介绍了 Scheme 的列表操作。下面是对 Scheme 中列表处理的更详细讨论。
One of the more common uses of the Lisp-based programming languages is list processing. This subsection introduces the Scheme functions for dealing with lists. Recall that Scheme’s list operations were briefly introduced in Chapter 6. Following is a more detailed discussion of list processing in Scheme.
Scheme 程序由函数应用函数 解释EVAL。当应用于原始函数时,EVAL首先评估给定函数的参数。当函数调用中的实际参数本身就是函数调用时,此操作是必要的,这种情况经常发生。但是,在某些调用中,参数是数据元素而不是函数引用。当参数不是函数引用时,显然不应对其进行评估。我们之前并不关心这一点,因为数字文字总是对其自身进行评估,并且不会被误认为是函数名称。
Scheme programs are interpreted by the function application function, EVAL. When applied to a primitive function, EVAL first evaluates the parameters of the given function. This action is necessary when the actual parameters in a function call are themselves function calls, which is frequently the case. In some calls, however, the parameters are data elements rather than function references. When a parameter is not a function reference, it obviously should not be evaluated. We were not concerned with this earlier, because numeric literals always evaluate to themselves and cannot be mistaken for function names.
假设我们有一个函数,它有两个参数,一个原子和一个列表,该函数的目的是确定给定的原子是否在给定的列表中。原子和列表都不应被评估;它们是需要处理的文字数据。为了避免评估参数,首先将其作为参数提供给原始函数QUOTE,该函数只是将其不加更改地返回。以下示例说明QUOTE:
Suppose we have a function that has two parameters, an atom and a list, and the purpose of the function is to determine whether the given atom is in the given list. Neither the atom nor the list should be evaluated; they are literal data to be processed. To avoid evaluating a parameter, it is first given as a parameter to the primitive function QUOTE, which simply returns it without change. The following examples illustrate QUOTE:
(QUOTE A)返回A(QUOTE (A B C))返回(A B C)
(QUOTE A)returnsA(QUOTE (A B C))returns(A B C)
调用QUOTE通常会在要引用的表达式前加上撇号 ( '),并省略表达式周围的括号,从而简化调用。因此,使用(QUOTE (A B)),而不是 。'(A B)
Calls to QUOTE are usually abbreviated by preceding the expression to be quoted with an apostrophe (') and leaving out the parentheses around the expression. Thus, instead of (QUOTE (A B)), '(A B) is used.
之所以必要,QUOTE是因为 Scheme(以及其他基于 Lisp 的语言)的基本性质:数据和代码具有相同的形式。虽然这对于命令式语言程序员来说可能看起来很奇怪,但它却产生了一些有趣且强大的过程,其中一个过程在第 15.5.14节 中讨论。
The necessity of QUOTE arises because of the fundamental nature of Scheme (and the other Lisp-based languages): data and code have the same form. Although this may seem odd to imperative language programmers, it results in some interesting and powerful processes, one of which is discussed in Section 15.5.14.
CAR、CDR 和 CONS 函数已在第6章 CAR中介绍。以下是和运算的附加示例CDR:
The CAR, CDR, and CONS functions were introduced in Chapter 6. Following are additional examples of the operations of CAR and CDR:
(CAR '(A B C))返回A(CAR '((A B) C D))返回(A B)(CAR 'A)是一个错误,因为A不是列表(CAR '(A))返回A(CAR '())是一个错误(CDR '(A B C))返回(B C)(CDR '((A B) C D))返回(C D)(CDR 'A)是一个错误(CDR '(A))返回()(CDR '())是一个错误
(CAR '(A B C))returnsA(CAR '((A B) C D))returns(A B)(CAR 'A)is an error becauseAis not a list(CAR '(A))returnsA(CAR '())is an error(CDR '(A B C))returns(B C)(CDR '((A B) C D))returns(C D)(CDR 'A)is an error(CDR '(A))returns()(CDR '())is an error
CAR和函数的名称CDR充其量只是有点奇怪。这些名称的起源在于 Lisp 的第一个实现,它在 IBM 704 计算机上。704 的内存字有两个字段,名为减量和地址,用于各种操作数寻址策略。每个字段都可以存储一个机器内存地址。704 还包括两个机器指令,也称为CAR(寄存器地址部分的内容)和(寄存器减量部分的内容),用于提取相关字段。很自然地使用这两个字段来存储列表节点的两个指针,以便内存字可以整齐地存储一个节点。使用这些约定, 704 的和指令提供了高效的列表选择器。这些名称延续到了所有 Lisp 方言的原语中。CDRCARCDR
The names of the CAR and CDR functions are peculiar at best. The origin of these names lies in the first implementation of Lisp, which was on an IBM 704 computer. The 704’s memory words had two fields, named decrement and address, that were used in various operand addressing strategies. Each of these fields could store a machine memory address. The 704 also included two machine instructions, also named CAR (contents of the address part of a register) and CDR (contents of the decrement part of a register), that extracted the associated fields. It was natural to use the two fields to store the two pointers of a list node so that a memory word could neatly store a node. Using these conventions, the CAR and CDR instructions of the 704 provided efficient list selectors. The names carried over into the primitives of all dialects of Lisp.
作为另一个简单函数的例子,考虑
As another example of a simple function, consider
(DEFINE (second a_list) (CAR (CDR a_list)))(DEFINE (second a_list) (CAR (CDR a_list)))
一旦评估了这个函数,就可以使用它,就像
Once this function is evaluated, it can be used, as in
(second '(A B C))(second '(A B C))
返回B。
which returns B.
Scheme 中一些最常用的函数组合都是作为单个函数内置的。例如,(CAAR x)等价于(CAR(CAR x)),(CADR x)等价于(CAR (CDR x)),(CADDAR x)等价于。函数名称中的 ' '和'之间可以合法地使用 和 的(CAR (CDR (CDR (CAR x))))任何组合(最多四个)。例如,考虑以下 的求值:ADCRCADDAR
Some of the most commonly used functional compositions in Scheme are built in as single functions. For example, (CAAR x) is equivalent to (CAR(CAR x)), (CADR x) is equivalent to (CAR (CDR x)), and (CADDAR x) is equivalent to (CAR (CDR (CDR (CAR x)))). Any combination of A’s and D’s, up to four, are legal between the ‘C’ and the ‘R’ in the function’s name. As an example, consider the following evaluation of CADDAR:
(CADDAR '((A B (C) D) E)) =
(CAR (CDR (CDR (CAR '((A B (C) D) E))))) =
(CAR (CDR (CDR '(A B (C) D)))) =
(CAR (CDR '(B (C) D))) =
(CAR '((C) D)) =
(C)
(CADDAR '((A B (C) D) E)) =
(CAR (CDR (CDR (CAR '((A B (C) D) E))))) =
(CAR (CDR (CDR '(A B (C) D)))) =
(CAR (CDR '(B (C) D))) =
(CAR '((C) D)) =
(C)
以下是示例调用CONS:
Following are example calls to CONS:
(CONS 'A '())退货(A)(CONS 'A '(B C))退货(A B C)(CONS '() '(A B))退货(() A B)(CONS '(A B) '(C D))退货((A B) C D)
(CONS 'A '())returns(A)(CONS 'A '(B C))returns(A B C)(CONS '() '(A B))returns(() A B)(CONS '(A B) '(C D))returns((A B) C D)
这些操作的结果如图15.2CONS所示。请注意,从某种意义上说,是和的逆。和将列表分开,并从两个给定的列表部分构造一个新列表。两个参数成为新列表的和。因此,如果是列表,那么 CONSCARCDRCARCDRCONSCONSCARCDRa_list
The results of these CONS operations are shown in Figure 15.2. Note that CONS is, in a sense, the inverse of CAR and CDR. CAR and CDR take a list apart, and CONS constructs a new list from two given list parts. The two parameters to CONS become the CAR and CDR of the new list. Thus, if a_list is a list, then
(CONS (CAR a_list) (CDR a_list))(CONS (CAR a_list) (CDR a_list))
返回与 具有相同结构和相同元素的列表a_list。
returns a list with the same structure and same elements as a_list.
仅处理本章讨论的相对简单的问题和程序,不太可能有人会故意将其应用于CONS两个原子,尽管这是合法的。这种应用的结果是一个点对,之所以这样命名是因为它在 Scheme 中的显示方式。例如,考虑以下调用:
Dealing only with the relatively simple problems and programs discussed in this chapter, it is unlikely one would intentionally apply CONS to two atoms, although that is legal. The result of such an application is a dotted pair, so named because of the way it is displayed by Scheme. For example, consider the following call:
(CONS 'A 'B)(CONS 'A 'B)
如果显示结果,则将显示为
If the result of this is displayed, it would appear as
(A . B)(A . B)
这对点表示该单元有两个原子,而不是一个原子和一个指针或一个指针和一个指针。
This dotted pair indicates that instead of an atom and a pointer or a pointer and a pointer, this cell has two atoms.
LIST是一个从可变数量的参数构造列表的函数。它是嵌套CONS函数的简写版本,如下所示:
LIST is a function that constructs a list from a variable number of parameters. It is a shorthand version of nested CONS functions, as illustrated in the following:
(LIST 'apple 'orange 'grape)(LIST 'apple 'orange 'grape)
返回
returns
(apple orange grape)(apple orange grape)
使用CONS,对上面的调用LIST写如下:
Using CONS, the call to LIST above is written as follows:
(CONS 'apple (CONS 'orange (CONS 'grape '())))(CONS 'apple (CONS 'orange (CONS 'grape '())))Scheme 有三个基本谓词函数,,,EQ?和NULL?,LIST?分别用于符号原子和列表。
Scheme has three fundamental predicate functions, EQ?, NULL?, and LIST?, for symbolic atoms and lists.
该EQ?函数接受两个表达式作为参数,尽管它通常与两个符号原子参数一起使用。#T如果两个参数具有相同的指针值(即它们指向同一个原子或列表),则返回;否则,返回#F。如果两个参数是符号原子,EQ?则#T如果它们是相同的符号(因为 Scheme 不会重复符号),则返回;否则返回#F。考虑以下示例:
The EQ? function takes two expressions as parameters, although it is usually used with two symbolic atom parameters. It returns #T if both parameters have the same pointer value—that is, they point to the same atom or list; otherwise, it returns #F. If the two parameters are symbolic atoms, EQ? returns #T if they are the same symbols (because Scheme does not make duplicates of symbols); otherwise #F. Consider the following examples:
(EQ? 'A 'A)退货#T(EQ? 'A 'B)退货#F(EQ? 'A '(A B))退货#F(EQ? '(A B) '(A B))退货#F或#T(EQ? 3.4 (+ 3 0.4))退货#F或#T
(EQ? 'A 'A)returns#T(EQ? 'A 'B)returns#F(EQ? 'A '(A B))returns#F(EQ? '(A B) '(A B))returns#For#T(EQ? 3.4 (+ 3 0.4))returns#For#T
如第四个示例所示,与 的列表进行比较的结果EQ?不一致。原因是两个完全相同的列表通常不会在内存中重复。在 Scheme 系统创建列表时,它会检查是否已经存在这样的列表。如果存在,新列表只不过是指向现有列表的指针。在这些情况下,两个列表将通过 判断相等EQ?。但是,在某些情况下,可能很难检测到相同列表的存在,在这种情况下会创建一个新列表。在这种情况下,EQ?产生#F。
As the fourth example indicates, the result of comparing lists with EQ? is not consistent. The reason for this is that two lists that are exactly the same often are not duplicated in memory. At the time the Scheme system creates a list, it checks to see whether there is already such a list. If there is, the new list is nothing more than a pointer to the existing list. In these cases, the two lists will be judged equal by EQ?. However, in some cases, it may be difficult to detect the presence of an identical list, in which case a new list is created. In this scenario, EQ? yields #F.
最后一种情况表明,加法可能会产生一个新值,在这种情况下它不会等于(与EQ?)3.4,或者它可能识别出它已经具有该值3.4并使用它,在这种情况下EQ?将使用指向旧值的指针3.4并返回#T。
The last case shows that the addition may produce a new value, in which case it would not be equal (with EQ?) to 3.4, or it may recognize that it already has the value 3.4 and use it, in which case EQ? will use the pointer to the old 3.4 and return #T.
正如我们所见,EQ?适用于符号原子,但不一定适用于数字原子。谓词=适用于数字原子,但不适用于符号原子。如前所述,EQ?对于列表参数,也不能可靠地工作。
As we have seen, EQ? works for symbolic atoms but does not necessarily work for numeric atoms. The = predicate works for numeric atoms but not symbolic atoms. As discussed previously, EQ? also does not work reliably for list parameters.
有时,当不知道两个原子是符号原子还是数字原子时,能够测试它们是否相等是很方便的。为此,Scheme 有一个不同的谓词,EQV?它既适用于数字原子,也适用于符号原子。请考虑以下示例:
Sometimes it is convenient to be able to test two atoms for equality when it is not known whether they are symbolic or numeric. For this purpose, Scheme has a different predicate, EQV?, which works on both numeric and symbolic atoms. Consider the following examples:
(EQV? 'A 'A)退货#T(EQV? 'A 'B)退货#F(EQV? 3 3)退货#T(EQV? 'A 3)退货#F(EQV? 3.4 (+ 3 0.4))退货#T(EQV? 3.0 3)退货#F
(EQV? 'A 'A)returns#T(EQV? 'A 'B)returns#F(EQV? 3 3)returns#T(EQV? 'A 3)returns#F(EQV? 3.4 (+ 3 0.4))returns#T(EQV? 3.0 3)returns#F
请注意,最后一个例子表明浮点值与整数值不同。EQV?不是指针比较,而是值比较。
Notice that the last example demonstrates that floating-point values are different from integer values. EQV? is not a pointer comparison, it is a value comparison.
在可能的情况下使用EQ?或=而不是 的主要原因是和比 更快。EQV?EQ?=EQV?
The primary reason to use EQ? or = rather than EQV? when it is possible is that EQ? and = are faster than EQV?.
LIST?如果谓词函数#T的单个参数是一个列表,则返回该参数#F,否则返回该参数,如下例所示:
The LIST? predicate function returns #T if its single argument is a list and #F otherwise, as in the following examples:
(LIST? '(X Y))返回#T(LIST? 'X)返回#F(LIST? '())返回#T
(LIST? '(X Y))returns#T(LIST? 'X)returns#F(LIST? '())returns#T
该NULL?函数测试其参数以确定它是否为空列表,#T如果是则返回。请考虑以下示例:
The NULL? function tests its parameter to determine whether it is the empty list and returns #T if it is. Consider the following examples:
(NULL? '(A B))退货#F(NULL? '())退货#T(NULL? 'A)退货#F(NULL? '(()))退货#F
(NULL? '(A B))returns#F(NULL? '())returns#T(NULL? 'A)returns#F(NULL? '(()))returns#F
最后一次调用失败,#F因为参数不是空列表。相反,它是一个包含单个元素(即空列表)的列表。
The last call yields #F because the parameter is not the empty list. Rather, it is a list containing a single element, the empty list.
本节包含 Scheme 中函数定义的几个示例。这些程序解决了简单的列表处理问题。
This section contains several examples of function definitions in Scheme. These programs solve simple list-processing problems.
考虑给定原子在给定列表中的成员资格问题,该列表不包括子列表。这样的列表称为简单列表。如果函数名为member,则可以按如下方式使用:
Consider the problem of membership of a given atom in a given list that does not include sublists. Such a list is called a simple list. If the function is named member, it could be used as follows:
(member 'B '(A B C))返回#T(member 'B '(A C D E))返回#F
(member 'B '(A B C))returns#T(member 'B '(A C D E))returns#F
从迭代的角度考虑,成员问题只是将给定的原子和给定列表的各个元素按某种顺序逐一进行比较,直到找到匹配项或列表中没有其他元素可以比较。可以使用递归完成类似的过程。该函数可以将给定的原子与CAR列表的进行比较。如果它们匹配,则#T返回值。如果它们不匹配,CAR则应忽略列表的,并继续在CDR列表的上搜索。这可以通过让函数以CDR列表的作为列表参数调用自身并返回此递归调用的结果来实现。如果在列表中找到给定的原子,则此过程将结束。如果原子不在列表中,则最终将(由其自身)使用空列表作为实际参数调用该函数。该事件必须强制函数返回#F。在此过程中,有两种方法可以退出递归:列表在某些调用时为空,在这种情况下#F返回,或者找到匹配项并#T返回。
Thinking in terms of iteration, the membership problem is simply to compare the given atom and the individual elements of the given list, one at a time in some order, until either a match is found or there are no more elements in the list to be compared. A similar process can be accomplished using recursion. The function can compare the given atom with the CAR of the list. If they match, the value #T is returned. If they do not match, the CAR of the list should be ignored and the search continued on the CDR of the list. This can be done by having the function call itself with the CDR of the list as the list parameter and return the result of this recursive call. This process will end if the given atom is found in the list. If the atom is not in the list, the function will eventually be called (by itself) with a null list as the actual parameter. That event must force the function to return #F. In this process, there are two ways out of the recursion: Either the list is empty on some call, in which case #F is returned, or a match is found and #T is returned.
总的来说,函数中必须处理三种情况:输入列表为空、原子与CAR列表的匹配或原子与CAR列表的不匹配,这会导致递归调用。这三个是 的三个参数COND,最后一个是谓词触发的默认情况ELSE。完整的函数如下:4
Altogether, there are three cases that must be handled in the function: an empty input list, a match between the atom and the CAR of the list, or a mismatch between the atom and the CAR of the list, which causes the recursive call. These three are the three parameters to COND, with the last being the default case that is triggered by an ELSE predicate. The complete function follows:4
(DEFINE (member atm a_list)
(COND
((NULL? a_list) #F)
((EQ? atm (CAR a_list)) #T)
(ELSE (member atm (CDR a_list)))
))
(DEFINE (member atm a_list)
(COND
((NULL? a_list) #F)
((EQ? atm (CAR a_list)) #T)
(ELSE (member atm (CDR a_list)))
))
这种形式是简单的 Scheme 列表处理函数的典型形式。在此类函数中,列表中的数据每次处理一个元素。单个元素用 指定,然后使用列表的CAR上的递归继续处理。CDR
This form is typical of simple Scheme list-processing functions. In such functions, the data in lists are processed one element at a time. The individual elements are specified with CAR, and the process is continued using recursion on the CDR of the list.
请注意,空测试必须在相等测试之前,因为应用于CAR空列表是错误的。
Note that the null test must precede the equal test, because applying CAR to an empty list is an error.
再举一个例子,考虑确定两个给定列表是否相等的问题。如果两个列表很简单,那么解决方案相对容易,尽管涉及一些读者可能不熟悉的编程技术。equalsimp用于比较简单列表的谓词函数如下所示:
As another example, consider the problem of determining whether two given lists are equal. If the two lists are simple, the solution is relatively easy, although some programming techniques with which the reader may not be familiar are involved. A predicate function, equalsimp, for comparing simple lists is shown here:
(DEFINE (equalsimp list1 list2)
(COND
((NULL? list1) (NULL? list2))
((NULL? list2) #F)
((EQ? (CAR list1) (CAR list2))
(equalsimp (CDR list1) (CDR list2)))
(ELSE #F)
))
(DEFINE (equalsimp list1 list2)
(COND
((NULL? list1) (NULL? list2))
((NULL? list2) #F)
((EQ? (CAR list1) (CAR list2))
(equalsimp (CDR list1) (CDR list2)))
(ELSE #F)
))
第一种情况由 的第一个参数处理COND,适用于第一个列表参数为空列表的情况。如果第一个列表参数最初为空,则可能在外部调用中发生这种情况。由于递归调用使用CDR两个参数列表的 s 作为参数,因此在这样的调用中第一个列表参数可以为空(如果第一个列表参数现在为空)。当第一个列表参数为空时,必须检查第二个列表参数是否也为空。如果是这样,则它们相等(最初或 sCAR在所有先前的递归调用中相等),并NULL?正确返回#T。如果第二个列表参数不为空,则它大于第一个列表参数,#F应该返回,因为它是NULL?。
The first case, which is handled by the first parameter to COND, is for when the first list parameter is the empty list. This can occur in an external call if the first list parameter is initially empty. Because a recursive call uses the CDRs of the two parameter lists as its parameters, the first list parameter can be empty in such a call (if the first list parameter is now empty). When the first list parameter is empty, the second list parameter must be checked to see whether it is also empty. If so, they are equal (either initially or the CARs were equal on all previous recursive calls), and NULL? correctly returns #T. If the second list parameter is not empty, it is larger than the first list parameter and #F should be returned, as it is by NULL?.
下一个案例处理第一个列表不为空时第二个列表为空的情况。这种情况仅当第一个列表比第二个列表长时才会发生。只需测试第二个列表,因为第一个案例捕获了第一个列表为空的所有实例。
The next case deals with the second list being empty when the first list is not. This situation occurs only when the first list is longer than the second. Only the second list must be tested, because the first case catches all instances of the first list being empty.
第三种情况是递归步骤,测试两个列表中两个对应元素是否相等。它通过比较CAR两个非空列表的 s 来实现这一点。如果它们相等,则这两个列表在此点之前是相等的,因此对两者的 s 都使用递归CDR。当发现两个不相等的原子时,这种情况会失败。当发生这种情况时,该过程不需要继续,因此ELSE选择默认情况,即返回#F。
The third case is the recursive step that tests for equality between two corresponding elements in the two lists. It does this by comparing the CARs of the two nonempty lists. If they are equal, then the two lists are equal up to that point, so recursion is used on the CDRs of both. This case fails when two unequal atoms are found. When this occurs, the process need not continue, so the default case ELSE is selected, which returns #F.
请注意,equalsimp需要列表作为参数,如果其中一个或两个参数都是原子,则无法正常运行。
Note that equalsimp expects lists as parameters and does not operate correctly if either or both parameters are atoms.
比较一般列表的问题比这稍微复杂一些,因为在比较过程中必须完全跟踪子列表。在这种情况下,递归的力量是唯一合适的,因为子列表的形式与给定列表的形式相同。任何时候,两个给定列表的对应元素都是列表,它们被分成两个部分,CAR和CDR,并对它们使用递归。这是分而治之方法实用性的完美例子。如果两个给定列表的对应元素是原子,则可以使用简单地进行比较EQ?。
The problem of comparing general lists is slightly more complex than this, because sublists must be traced completely in the comparison process. In this situation, the power of recursion is uniquely appropriate, because the form of sublists is the same as that of the given lists. Any time the corresponding elements of the two given lists are lists, they are separated into their two parts, CAR and CDR, and recursion is used on them. This is a perfect example of the usefulness of the divide-and-conquer approach. If the corresponding elements of the two given lists are atoms, they can simply be compared using EQ?.
完整函数的定义如下:
The definition of the complete function follows:
(DEFINE (equal list1 list2)
(COND
((NOT (LIST? list1)) (EQ? list1 list2))
((NOT (LIST? list2)) #F)
((NULL? list1) (NULL? list2))
((NULL? list2) #F)
((equal (CAR list1) (CAR list2))
(equal (CDR list1) (CDR list2)))
(ELSE #F)
))
(DEFINE (equal list1 list2)
(COND
((NOT (LIST? list1)) (EQ? list1 list2))
((NOT (LIST? list2)) #F)
((NULL? list1) (NULL? list2))
((NULL? list2) #F)
((equal (CAR list1) (CAR list2))
(equal (CDR list1) (CDR list2)))
(ELSE #F)
))
的前两种情况处理的COND是参数之一是原子而不是列表的情况。第三和第四种情况适用于一个或两个列表为空的情况。这些情况还可以防止后续情况尝试应用于空CAR列表。第五种COND情况最有趣。谓词是一个以CAR列表的 s 作为参数的递归调用。如果此递归调用返回#T,则再次对列表的 s 使用递归CDR。该算法允许两个列表包含任意深度的子列表。
The first two cases of the COND handle the situation where either of the parameters is an atom instead of a list. The third and fourth cases are for the situation where one or both lists are empty. These cases also prevent subsequent cases from attempting to apply CAR to an empty list. The fifth COND case is the most interesting. The predicate is a recursive call with the CARs of the lists as parameters. If this recursive call returns #T, then recursion is used again on the CDRs of the lists. This algorithm allows the two lists to include sublists to any depth.
的定义equal适用于任何一对表达式,而不仅仅是列表。equal相当于系统谓词函数EQUAL?。 注意EQUAL?只应在必要时使用(实际参数的形式未知),因为它比EQ?和慢得多EQV?。
This definition of equal works on any pair of expressions, not just lists. equal is equivalent to the system predicate function EQUAL?. Note that EQUAL? should be used only when necessary (the forms of the actual parameters are not known), because it is much slower than EQ? and EQV?.
另一个常用的列表操作是构造一个包含两个给定列表参数的所有元素的新列表。这通常作为名为 的 Scheme 函数实现append。可以通过重复使用 来构造结果列表,CONS将第一个列表参数的元素放入第二个列表参数中,后者成为结果列表。为了阐明 的操作append,请考虑以下示例:
Another commonly needed list operation is that of constructing a new list that contains all of the elements of two given list arguments. This is usually implemented as a Scheme function named append. The result list can be constructed by repeated use of CONS to place the elements of the first list argument into the second list argument, which becomes the result list. To clarify the action of append, consider the following examples:
(append '(A B) '(C D R))返回(A B C D R)(append '((A B) C) '(D (E F)))返回((A B) C D (E F))
(append '(A B) '(C D R))returns(A B C D R)(append '((A B) C) '(D (E F)))returns((A B) C D (E F))
的定义append是5
The definition of append is5
(DEFINE (append list1 list2)
(COND
((NULL? list1) list2)
(ELSE (CONS (CAR list1) (append (CDR list1) list2)))
))
(DEFINE (append list1 list2)
(COND
((NULL? list1) list2)
(ELSE (CONS (CAR list1) (append (CDR list1) list2)))
))
第一种COND情况用于在第一个参数列表为空时终止递归过程,返回第二个列表。在第二种情况下(ELSE),CAR将第一个参数列表的添加CONS到递归调用返回的结果中,递归调用将CDR第一个列表的 作为其第一个参数传递。
The first COND case is used to terminate the recursive process when the first argument list is empty, returning the second list. In the second case (the ELSE), the CAR of the first parameter list is CONSed onto the result returned by the recursive call, which passes the CDR of the first list as its first parameter.
考虑以下名为 的 Scheme 函数guess,它使用member本节中描述的函数。在阅读后面的描述之前,尝试确定它的作用。假设参数是简单列表。
Consider the following Scheme function, named guess, which uses the member function described in this section. Try to determine what it does before reading the description that follows it. Assume the parameters are simple lists.
(DEFINE (guess list1 list2)
(COND
((NULL? list1) '())
((member (CAR list1) list2)
(CONS (CAR list1) (guess (CDR list1) list2)))
(ELSE (guess (CDR list1) list2))
))
(DEFINE (guess list1 list2)
(COND
((NULL? list1) '())
((member (CAR list1) list2)
(CONS (CAR list1) (guess (CDR list1) list2)))
(ELSE (guess (CDR list1) list2))
))
guess生成一个包含两个参数列表的公共元素的简单列表。因此,如果参数列表表示集合,则guess计算表示这两个集合的交集的列表。
guess yields a simple list that contains the common elements of its two parameter lists. So, if the parameter lists represent sets, guess computes a list that represents the intersection of those two sets.
LETLETLET是一个函数(最初在第5章 中描述),它创建一个局部作用域,其中名称暂时绑定到表达式的值。它通常用于从更复杂的表达式中分解出公共子表达式。然后可以在另一个表达式的求值中使用这些名称,但它们不能在 中重新绑定到新值LET。以下示例说明了 的用法LET。它计算给定二次方程的根,假设根是实数。6实根(而不是复根)的数学定义
如下:
LET is a function (initially described in Chapter 5) that creates a local scope in which names are temporarily bound to the values of expressions. It is often used to factor out the common subexpressions from more complicated expressions. These names can then be used in the evaluation of another expression, but they cannot be rebound to new values in LET. The following example illustrates the use of LET. It computes the roots of a given quadratic equation, assuming the roots are real.6 The mathematical definitions of the real (as opposed to complex) roots of the quadratic equation
are as follows:
(DEFINE (quadratic_roots a b c)
(LET (
(root_part_over_2a
(/ (SQRT (- (* b b) (* 4 a c))) (* 2 a)))
(minus_b_over_2a (/ (- 0 b) (* 2 a)))
)
(LIST (+ minus_b_over_2a root_part_over_2a)
(- minus_b_over_2a root_part_over_2a))
))
(DEFINE (quadratic_roots a b c)
(LET (
(root_part_over_2a
(/ (SQRT (- (* b b) (* 4 a c))) (* 2 a)))
(minus_b_over_2a (/ (- 0 b) (* 2 a)))
)
(LIST (+ minus_b_over_2a root_part_over_2a)
(- minus_b_over_2a root_part_over_2a))
))
此示例使用LIST创建组成结果的两个值的列表。
This example uses LIST to create the list of the two values that make up the result.
由于第一部分中绑定的名称LET不能在下面的表达式中更改,因此它们与命令式语言中的块。它们都可以通过在表达式中用各自的表达式文本替换其名称来消除LET。
Because the names bound in the first part of a LET construct cannot be changed in the following expression, they are not the same as local variables in a block in an imperative language. They could all be eliminated by textual substitution of their respective expressions for their names in the LET expression.
LETLAMBDA实际上是应用于参数的表达式的简写。以下两个表达式是等效的:
LET is actually shorthand for a LAMBDA expression applied to a parameter. The following two expressions are equivalent:
(LET ((alpha 7))(* 5 alpha))
((LAMBDA (alpha) (* 5 alpha)) 7)
(LET ((alpha 7))(* 5 alpha))
((LAMBDA (alpha) (* 5 alpha)) 7)
在第一个表达式中,与7绑定;在第二个表达式中,通过 表达式的参数绑定到。alphaLET7alphaLAMBDA
In the first expression, 7 is bound to alpha with LET; in the second, 7 is bound to alpha through the parameter of the LAMBDA expression.
如果函数的递归调用是函数中的最后一个操作,则该函数为尾递归函数。这意味着递归调用的返回值是对该函数的非递归调用的返回值。例如,此处重复的第 15.5.10节 的成员函数是尾递归函数。
A function is tail recursive if its recursive call is the last operation in the function. This means that the return value of the recursive call is the return value of the nonrecursive call to the function. For example, the member function of Section 15.5.10, repeated here, is tail recursive.
(DEFINE (member atm a_list)
(COND
((NULL? a_list) #F)
((EQ? atm (CAR a_list)) #T)
(ELSE (member atm (CDR a_list)))
))
(DEFINE (member atm a_list)
(COND
((NULL? a_list) #F)
((EQ? atm (CAR a_list)) #T)
(ELSE (member atm (CDR a_list)))
))
编译器可以自动将此函数转换为使用迭代,从而比递归形式执行得更快。
This function can be automatically converted by a compiler to use iteration, resulting in faster execution than in its recursive form.
但是,许多使用递归进行重复的函数不是尾递归的。关注效率的程序员已经找到了重写其中一些函数的方法,使它们成为尾递归。其中一个例子是使用累积参数和辅助函数。作为这种方法的一个示例,请考虑第15.5.7节 中的阶乘函数,它在此处重复:
However, many functions that use recursion for repetition are not tail recursive. Programmers who are concerned with efficiency have discovered ways to rewrite some of these functions so that they are tail recursive. One example of this uses an accumulating parameter and a helper function. As an example of this approach, consider the factorial function from Section 15.5.7, which is repeated here:
(DEFINE (factorial n)
(IF (<= n 1)
1
(* n (factorial (- n 1)))
))
(DEFINE (factorial n)
(IF (<= n 1)
1
(* n (factorial (- n 1)))
))
此函数的最后一个操作是乘法。该函数的工作原理是创建要相乘的数字列表,然后在递归展开时进行乘法以产生结果。这些数字中的每一个都是由函数的激活创建的,并且每个数字都存储在激活记录实例中。随着递归展开,这些数字被相乘。回想一下,在第9章中对阶乘进行几次递归调用后显示了堆栈 中对阶乘进行几次递归调用后显示了堆栈。可以使用辅助助手重写此阶乘函数函数,它使用一个参数来累积部分阶乘。辅助函数是尾递归的,也接受factorial的参数。这些函数如下:
The last operation of this function is the multiplication. The function works by creating the list of numbers to be multiplied together and then doing the multiplications as the recursion unwinds to produce the result. Each of these numbers is created by an activation of the function and each is stored in an activation record instance. As the recursion unwinds the numbers are multiplied together. Recall that the stack is shown after several recursive calls to factorial in Chapter 9. This factorial function can be rewritten with an auxiliary helper function, which uses a parameter to accumulate the partial factorial. The helper function, which is tail recursive, also takes factorial’s parameter. These functions are as follows:
(DEFINE (facthelper n factpartial)
(IF (<= n 1)
factpartial
(facthelper (- n 1) (* n factpartial))
))
(DEFINE (factorial n)
(facthelper n 1)
)
(DEFINE (facthelper n factpartial)
(IF (<= n 1)
factpartial
(facthelper (- n 1) (* n factpartial))
))
(DEFINE (factorial n)
(facthelper n 1)
)
使用这些函数,结果是在递归调用期间计算的,而不是在递归展开时计算的。由于活动记录实例中没有任何有用的东西,因此它们不是必需的。无论请求多少次递归调用,只需要一个活动记录实例。这使得尾递归版本比非尾递归版本效率高得多。
With these functions, the result is computed during the recursive calls, rather than as the recursion unwinds. Because there is nothing useful in the activation record instances, they are not necessary. Regardless of how many recursive calls are requested, only one activation record instance is necessary. This makes the tail-recursive version far more efficient than the non-tail-recursive version.
Scheme 语言定义要求 Scheme 语言处理系统转换所有尾递归函数,用迭代代替递归。因此,至少为了提高效率,将使用递归的函数定义为尾递归非常重要。某些函数式语言的某些优化编译器甚至可以将某些非尾递归函数转换为等效的尾递归函数,然后对这些函数进行编码,以使用迭代而不是递归进行重复。
The Scheme language definition requires that Scheme language processing systems convert all tail-recursive functions to replace that recursion with iteration. Therefore, it is important, at least for efficiency’s sake, to define functions that use recursion to specify repetition to be tail recursive. Some optimizing compilers for some functional languages can even perform conversions of some non-tail-recursive functions to equivalent tail-recursive functions and then code these functions to use iteration instead of recursion for repetition.
本节介绍 Scheme 提供的两种常见数学函数形式:composition 和 apply-to-all。 这两种函数的数学定义在第 15.2.2节 中。
This section describes two common mathematical functional forms that are provided by Scheme: composition and apply-to-all. Both are mathematically defined in Section 15.2.2.
函数组合是原始 Lisp 提供的唯一原始函数形式。所有后续的 Lisp 方言(包括 Scheme)也提供这种形式。如第15.2.2节 所述,函数组合是一种函数形式,它接受两个函数作为参数,并返回一个函数,该函数首先将第二个参数函数应用于其参数,然后将第一个参数函数应用于第二个参数函数的返回值。换句话说,该函数是和h的组合函数,如果fgh(x)
f(g(x))。例如,考虑以下示例:
Functional composition is the only primitive functional form provided by the original Lisp. All subsequent Lisp dialects, including Scheme, also provide it. As stated in Section 15.2.2, function composition is a functional form that takes two functions as parameters and returns a function that first applies the second parameter function to its parameter and then applies the first parameter function to the return value of the second parameter function. In other words, the function h is the composition function of f and g if h(x)
f(g(x)). For example, consider the following example:
(DEFINE (g x) (* 3 x))
(DEFINE (f x) (+ 2 x))
(DEFINE (g x) (* 3 x))
(DEFINE (f x) (+ 2 x))
f现在和的函数组合g可以写成如下形式:
Now the functional composition of f and g can be written as follows:
(DEFINE (h x) (+ 2 (* 3 x)))(DEFINE (h x) (+ 2 (* 3 x)))
在 Scheme 中,函数组合函数compose可以写成如下形式:
In Scheme, the functional composition function compose can be written as follows:
(DEFINE (compose f g) (LAMBDA (x)(f (g x))))(DEFINE (compose f g) (LAMBDA (x)(f (g x))))
例如,我们可以有以下内容:
For example, we could have the following:
((compose CAR CDR) '((a b) c d))((compose CAR CDR) '((a b) c d))
此调用将产生c。这是 的替代形式,尽管效率较低CADR。现在考虑对 的另一个调用compose:
This call would yield c. This is an alternative, though less efficient, form of CADR. Now consider another call to compose:
((compose CDR CAR) '((a b) c d))((compose CDR CAR) '((a b) c d))
此调用将产生(b)。这是 的替代方案CDAR。
This call would yield (b). This is an alternative to CDAR.
作为使用 的另一个示例compose,请考虑以下内容:
As yet another example of the use of compose, consider the following:
(DEFINE (third a_list)
((compose CAR (compose CDR CDR)) a_list))
(DEFINE (third a_list)
((compose CAR (compose CDR CDR)) a_list))
这是 的替代方案CADDR。
This is an alternative to CADDR.
函数式编程语言中提供的最常见函数形式是数学上适用于所有函数形式的变体。其中最简单的是map,它有两个参数:一个函数和一个列表。map将给定的函数应用于给定列表的每个元素,并返回这些应用的结果列表。Scheme 定义map如下:7
The most common functional forms provided in functional programming languages are variations of mathematical apply-to-all functional forms. The simplest of these is map, which has two parameters: a function and a list. map applies the given function to each element of the given list and returns a list of the results of these applications. A Scheme definition of map follows:7
(DEFINE (map fun a_list)
(COND
((NULL? a_list) '())
(ELSE (CONS (fun (CAR a_list)) (map fun (CDR a_list))))
))
(DEFINE (map fun a_list)
(COND
((NULL? a_list) '())
(ELSE (CONS (fun (CAR a_list)) (map fun (CDR a_list))))
))
注意的简单形式map,它表达了一种复杂的函数形式。
Note the simple form of map, which expresses a complex functional form.
作为 的使用示例map,假设我们想对列表中的所有元素求立方。我们可以使用以下命令实现此目的:
As an example of the use of map, suppose we want all of the elements of a list cubed. We can accomplish this with the following:
(map (LAMBDA (num) (* num num num)) '(3 4 2 6))(map (LAMBDA (num) (* num num num)) '(3 4 2 6))
此调用返回(27 64 8 216)。
This call returns (27 64 8 216).
请注意,在此示例中,第一个参数mapcar是LAMBDA表达式。当EVAL评估LAMBDA表达式时,它会构造一个函数,该函数具有与任何预定义函数相同的形式,只是它是无名的。在示例表达式中,此无名函数会立即应用于参数列表的每个元素,并将结果返回到列表中。
Note that in this example, the first parameter to mapcar is a LAMBDA expression. When EVAL evaluates the LAMBDA expression, it constructs a function that has the same form as any predefined function except that it is nameless. In the example expression, this nameless function is immediately applied to each element of the parameter list and the results are returned in a list.
程序和数据具有相同结构这一事实可用于构造程序。回想一下,Scheme 解释器使用一个名为 的函数EVAL。Scheme 系统适用EVAL于输入的每个表达式,无论是在交互式解释器的 Scheme 提示符下,还是正在解释的程序的一部分。SchemeEVAL程序也可以直接调用该函数。这为 Scheme 程序创建表达式并调用以对其进行求值提供了可能性EVAL。这并不是 Scheme 独有的,但其表达式的简单形式使得在执行期间创建它们变得很容易。
The fact that programs and data have the same structure can be exploited in constructing programs. Recall that the Scheme interpreter uses a function named EVAL. The Scheme system applies EVAL to every expression typed, whether it is at the Scheme prompt in the interactive interpreter or is part of a program being interpreted. The EVAL function can also be called directly by Scheme programs. This provides the possibility of a Scheme program creating expressions and calling EVAL to evaluate them. This is not something that is unique to Scheme, but the simple forms of its expressions make it easy to create them during execution.
此过程最简单的示例之一涉及数字原子。回想一下,Scheme 包含一个名为 的函数+,该函数以任意数量的数字原子作为参数并返回它们的总和。例如,(+ 3 7 10 2)返回22。
One of the simplest examples of this process involves numeric atoms. Recall that Scheme includes a function named +, which takes any number of numeric atoms as arguments and returns their sum. For example, (+ 3 7 10 2) returns 22.
我们的问题如下:假设在程序中我们有一个数字原子列表并需要总和。我们不能+直接应用于列表,因为+只能采用原子参数,而不是数字原子列表。当然,我们可以编写一个函数,使用递归遍历列表,反复将CAR列表的添加到其总和中。这样的函数如下:CDR
Our problem is the following: Suppose that in a program we have a list of numeric atoms and need the sum. We cannot apply + directly on the list, because + can take only atomic parameters, not a list of numeric atoms. We could, of course, write a function that repeatedly adds the CAR of the list to the sum of its CDR, using recursion to go through the list. Such a function follows:
(DEFINE (adder a_list)
(COND
((NULL? a_list) 0)
(ELSE (+ (CAR a_list) (adder (CDR a_list))))
))
(DEFINE (adder a_list)
(COND
((NULL? a_list) 0)
(ELSE (+ (CAR a_list) (adder (CDR a_list))))
))
以下是对的示例调用adder,以及递归调用和返回:
Following is an example call to adder, along with the recursive calls and returns:
(adder '(3 4 5))
(+ 3 (adder (4 5)))
(+ 3 (+ 4 (adder (5))))
(+ 3 (+ 4 (+ 5 (adder ()))))
(+ 3 (+ 4 (+ 5 0)))
(+ 3 (+ 4 5))
(+ 3 9)
(12)
(adder '(3 4 5))
(+ 3 (adder (4 5)))
(+ 3 (+ 4 (adder (5))))
(+ 3 (+ 4 (+ 5 (adder ()))))
(+ 3 (+ 4 (+ 5 0)))
(+ 3 (+ 4 5))
(+ 3 9)
(12)
解决这个问题的另一种方法是编写一个函数,使用适当的参数形式构建一个调用+。这可以通过使用CONS来实现构建一个新列表,该列表与参数列表完全相同,只是在其开头插入了原子+。然后可以提交此新列表EVAL进行评估,如下所示:
An alternative solution to the problem is to write a function that builds a call to + with the proper parameter forms. This can be done by using CONS to build a new list that is identical to the parameter list except it has the atom + inserted at its beginning. This new list can then be submitted to EVAL for evaluation, as in the following:
(DEFINE (adder a_list)
(COND
((NULL? a_list) 0)
(ELSE (EVAL (CONS '+ a_list)))
))
(DEFINE (adder a_list)
(COND
((NULL? a_list) 0)
(ELSE (EVAL (CONS '+ a_list)))
))
请注意,+函数名称被引号括起来,以防止EVAL在 的求值中对其进行求值CONS。以下是对此新版本的 的示例调用adder,以及对 的调用EVAL和返回值:
Note that the + function’s name is quoted to prevent EVAL from evaluating it in the evaluation of CONS. Following is an example call to this new version of adder, along with the call to EVAL and the return value:
(adder '(3 4 5))
(EVAL (+ 3 4 5)
(12)
(adder '(3 4 5))
(EVAL (+ 3 4 5)
(12)
在 Scheme 的所有早期版本中,EVAL函数在程序的最外层范围内求值其表达式。从 Scheme 4 开始,Scheme 的后续版本需要第二个参数来EVAL指定要求值表达式的范围。为简单起见,我们在示例中省略了范围参数,并且我们在此不讨论范围名称。
In all earlier versions of Scheme, the EVAL function evaluated its expression in the outermost scope of the program. The later versions of Scheme, beginning with Scheme 4, requires a second parameter to EVAL that specifies the scope in which the expression is to be evaluated. For simplicity’s sake, we left the scope parameter out of our example, and we do not discuss scope names here.
Common Lisp(Steele,1990)的创建是为了将 20 世纪 80 年代早期的几种 Lisp 方言(包括 Scheme)的功能整合到一种语言中。作为多种语言的结合,它非常庞大和复杂,在这方面与 C++ 和 C# 类似。然而,它的基础是原始的 Lisp,因此它的语法、原始函数和基本性质都来自该语言。
Common Lisp (Steele, 1990) was created in an effort to combine the features of several early 1980s dialects of Lisp, including Scheme, into a single language. Being something of a union of languages, it is quite large and complex, similar in these regards to C++ and C#. Its basis, however, is the original Lisp, so its syntax, primitive functions, and fundamental nature come from that language.
以下是用 Common Lisp 编写的阶乘函数:
Following is the factorial function written in Common Lisp:
(DEFUN factorial (x)
(IF (<= n 1)
1
(* n factorial (- n 1)))
))
(DEFUN factorial (x)
(IF (<= n 1)
1
(* n factorial (- n 1)))
))
该函数只有第一行在语法上与同一函数的 Scheme 版本有所不同。
Only the first line of this function differs syntactically from the Scheme version of the same function.
Common Lisp 的功能列表很长:大量的数据类型和结构,包括记录、数组、复数和字符串;强大的输入和输出操作;以及用于模块化函数和数据集合并提供访问控制的包形式。Common Lisp 包括几个命令式构造以及一些可变类型。
The list of features of Common Lisp is long: a large number of data types and structures, including records, arrays, complex numbers, and character strings; powerful input and output operations; and a form of packages for modularizing collections of functions and data, and also for providing access control. Common Lisp includes several imperative constructs, as well as some mutable types.
Common Lisp 认识到动态作用域偶尔提供的灵活性以及静态作用域的简单性,因此允许两者。变量的默认作用域是静态的,但通过将变量声明为“特殊”,该变量将变为动态作用域。
Recognizing the occasional flexibility provided by dynamic scoping, as well as the simplicity of static scoping, Common Lisp allows both. The default scoping for variables is static, but by declaring a variable to be “special,” that variable becomes dynamically scoped.
在 Common Lisp 中,宏经常被用来扩展语言。事实上,一些预定义函数实际上就是宏。例如,DOLIST,它接受两个参数,一个变量和一个列表,就是一个宏。例如,考虑以下内容:
Macros are often used in Common Lisp to extend the language. In fact, some of the predefined functions are actually macros. For example, DOLIST, which takes two parameters, a variable and a list, is a macro. For example, consider the following:
(DOLIST (x '(1 2 3)) (print x))(DOLIST (x '(1 2 3)) (print x))
这将产生以下内容:
This produces the following:
1
2
3
NIL
1
2
3
NIL
NIL这是的返回值DOLIST。
NIL here is the return value of DOLIST.
宏的作用分两步:首先,宏被展开。其次,对展开后的宏(即 Lisp 代码)进行求值。用户可以使用 定义自己的宏DEFMACRO。
Macros create their effect in two steps: First, the macro is expanded. Second, the expanded macro, which is Lisp code, is evaluated. Users can define their own macros with DEFMACRO.
Common Lisp 的反引号运算符 ( `) 与 Scheme 的 类似QUOTE,不同之处在于可以通过在参数的某些部分前面加上逗号来取消引用。例如,考虑以下两个示例:
The Common Lisp backquote operator (`) is similar to Scheme’s QUOTE, except some parts of the parameter can be unquoted by preceding them with commas. For example, consider the following two examples:
`(a (* 3 4) c)`(a (* 3 4) c)
此表达式的计算结果为(a (* 3 4) c)。但是,以下表达式:
This expression evaluates to (a (* 3 4) c). However, the following expression:
`(a ,(* 3 4) c)`(a ,(* 3 4) c)
计算结果为(a 12 c)。
evaluates to (a 12 c).
Lisp 实现有一个称为读取器的前端,它将 Lisp 程序的文本转换为代码表示。然后,代码表示中的宏调用被扩展为代码表示。然后,此步骤的输出被解释或编译成主机的机器语言,或者可能被编译成可以解释的中间代码。有一种特殊的宏,称为读取器宏或读取宏,它们在 Lisp 语言处理器的读取阶段被扩展。读取器宏将特定字符扩展为 Lisp 代码字符串。例如,Lisp 中的撇号是一个读取宏,它扩展为对的调用QUOTE。用户可以定义自己的读取器宏来创建其他简写结构。
Lisp implementations have a front end called the reader that transforms the text of Lisp programs into a code representation. Then, the macro calls in the code representation are expanded into code representations. The output of this step is then either interpreted or compiled into the machine language of the host computer, or perhaps into an intermediate code than can be interpreted. There is a special kind of macro, named reader macros or read macros, that are expanded during the reader phase of a Lisp language processor. A reader macro expands a specific character into a string of Lisp code. For example, the apostrophe in Lisp is a read macro that expands to a call to QUOTE. Users can define their own reader macros to create other shorthand constructs.
Common Lisp 以及其他基于 Lisp 的语言都具有符号数据类型。保留字是计算结果为自身的符号,如 和T。NIL从技术上讲,符号要么是绑定的,要么是未绑定的。在评估函数时,参数符号是绑定的。此外,作为名称的符号命令式变量的已赋值符号是绑定的。其他符号是未绑定的。例如,考虑以下表达式:
Common Lisp, as well as other Lisp-based languages, have a symbol data type. The reserved words are symbols that evaluate to themselves, as are T and NIL. Technically, symbols are either bound or unbound. Parameter symbols are bound while the function is being evaluated. Also, symbols that are the names of imperative-style variables and have been assigned values are bound. Other symbols are unbound. For example, consider the following expression:
(LIST '(A B C))(LIST '(A B C))
符号A、B和C不受约束。回想一下,Ruby 也有一个符号数据类型。
The symbols A, B, and C are unbound. Recall that Ruby also has a symbol data type.
从某种意义上说,Scheme 和 Common Lisp 是对立的。Scheme 更小,语义更简单,部分原因是它只使用静态作用域,也是因为它被设计用于教学编程,而 Common Lisp 则是一种商业语言。Common Lisp 已成功成为人工智能应用等领域广泛使用的语言。另一方面,Scheme 在大学的函数式编程课程中更常用。由于其规模相对较小,它也更有可能被当作一种函数式语言来研究。Common Lisp 的一个重要设计目标是希望使其与 Lisp 的几种方言兼容,这也是它成为一种大型语言的一个重要设计目标。
In a sense, Scheme and Common Lisp are opposites. Scheme is far smaller and semantically simpler, in part because of its exclusive use of static scoping, but also because it was designed to be used for teaching programming, whereas Common Lisp was meant to be a commercial language. Common Lisp has succeeded in being a widely used language for AI applications, among other areas. Scheme, on the other hand, is more frequently used in college courses on functional programming. It is also more likely to be studied as a functional language because of its relatively small size. An important design goal of Common Lisp that caused it to be a large language was the desire to make it compatible with several of the dialects of Lisp from which it was derived.
Common Lisp 对象系统 (CLOS) (Paepeke,1993) 于 20 世纪 80 年代末开发,是 Common Lisp 的面向对象版本。该语言支持通用函数和多重继承等结构。
The Common Lisp Object System (CLOS) (Paepeke, 1993) was developed in the late 1980s as an object-oriented version of Common Lisp. This language supports generic functions and multiple inheritance, among other constructs.
ML(Milner 等,1997)是一种静态作用域函数式编程语言,类似于 Scheme。但是,它在许多重要方面与 Lisp 及其方言(包括 Scheme)不同。一个重要的区别是 ML 是一种强类型语言,而 Scheme 本质上是无类型的。ML 具有函数参数和函数返回类型的类型声明,尽管由于其类型推断,它们通常不被使用。每个变量和表达式的类型都可以静态确定。与其他函数式编程语言一样,ML 没有命令式语言意义上的变量。它确实有标识符,这些标识符具有命令式语言中变量名称的外观。但是,这些标识符最好被视为值的名称。一旦设置,它们就无法更改。它们就像命令式语言的命名常量一样,如finalJava 中的声明。ML 标识符没有固定类型 - 任何标识符都可以是任何类型的值的名称。
ML (Milner et al., 1997) is a static-scoped functional programming language, like Scheme. However, it differs from Lisp and its dialects, including Scheme, in a number of significant ways. One important difference is that ML is a strongly typed language, whereas Scheme is essentially typeless. ML has type declarations for function parameters and the return types of functions, although because of its type inferencing they are often not used. The type of every variable and expression can be statically determined. ML, like other functional programming languages, does not have variables in the sense of the imperative languages. It does have identifiers, which have the appearance of names of variables in imperative languages. However, these identifiers are best thought of as names for values. Once set, they cannot be changed. They are like the named constants of imperative languages like final declarations in Java. ML identifiers do not have fixed types—any identifier can be the name of a value of any type.
一个称为求值环境的表存储了程序中所有隐式和显式声明的标识符的名称及其类型。这就像一个运行时符号表。当一个标识符被声明时,无论是隐式还是显式,它都会被放置在求值环境中。
A table called the evaluation environment stores the names of all implicitly and explicitly declared identifiers in a program, along with their types. This is like a run-time symbol table. When an identifier is declared, either implicitly or explicitly, it is placed in the evaluation environment.
Scheme 和 ML 之间的另一个重要区别是,ML 使用的语法与命令式语言的语法比 Lisp 的语法更接近。例如,算术表达式在 ML 中使用中缀表示法来编写。
Another important difference between Scheme and ML is that ML uses a syntax that is more closely related to that of an imperative language than that of Lisp. For example, arithmetic expressions are written in ML using infix notation.
ML 中的函数声明采用一般形式
Function declarations in ML appear in the general form
fun function_name(formal parameters) = expression;fun function_name(formal parameters) = expression;
调用时,函数返回表达式的值。实际上,表达式可以是表达式列表,用分号分隔并用括号括起来。在这种情况下,返回值是最后一个表达式的值。当然,除非它们有副作用,否则最后一个表达式之前的表达式没有任何用处。因为我们不考虑 ML 中有副作用的部分,所以我们只考虑具有单个表达式的函数定义。
When called, the value of the expression is returned by the function. Actually, the expression can be a list of expressions, separated by semicolons and surrounded by parentheses. The return value in this case is that of the last expression. Of course, unless they have side effects, the expressions before the last serve no purpose. Because we are not considering the parts of ML that have side effects, we only consider function definitions with a single expression.
现在我们可以讨论类型推断。考虑以下 ML 函数声明:
Now we can discuss type inference. Consider the following ML function declaration:
fun circumf(r) = 3.14159 * r * r;fun circumf(r) = 3.14159 * r * r;
这指定了一个名为的函数circumf,该函数接受浮点( ML 中的real)参数并生成浮点结果。类型是从表达式中的文字类型推断出来的。同样,在函数中
This specifies a function named circumf that takes a floating-point (real in ML) parameter and produces a floating-point result. The types are inferred from the type of the literal in the expression. Likewise, in the function
fun times10(x) = 10 * x;fun times10(x) = 10 * x;
参数和函数值被推断为int类型。
the parameter and functional value are inferred to be of type int.
考虑以下 ML 函数:
Consider the following ML function:
fun square(x) = x * x;fun square(x) = x * x;
ML 根据函数定义中的 * 运算符确定参数和返回值的类型。由于这是一个算术运算符,因此参数和函数的类型都假定为数字。在 ML 中,默认数字类型为int。因此,可以推断出 的参数和返回值的类型square为int。
ML determines the type of both the parameter and the return value from the * operator in the function definition. Because this is an arithmetic operator, the type of the parameter and the function are assumed to be numeric. In ML, the default numeric type is int. So, it is inferred that the type of the parameter and the return value of square is int.
如果square使用浮点值进行调用,例如
If square were called with a floating-point value, as in
square(2.75);square(2.75);
这会导致错误,因为 ML 不会将实数值强制转换为int类型。如果我们想square接受实数参数,可以将其重写为
it would cause an error, because ML does not coerce real values to int type. If we wanted square to accept real parameters, it could be rewritten as
fun square(x) : real = x * x;fun square(x) : real = x * x;
由于 ML 不允许重载函数,因此此版本无法与早期的int版本共存。最后定义的版本将是唯一的版本。
Because ML does not allow overloaded functions, this version could not coexist with the earlier int version. The last version defined would be the only one.
函数值的类型为实数,这一事实足以推断参数也是实数类型。以下每个定义也是合法的:
The fact that the functional value is typed real is sufficient to infer that the parameter is also real type. Each of the following definitions is also legal:
fun square(x : real) = x * x;
fun square(x) = (x : real) * x;
fun square(x) = x * (x : real);
fun square(x : real) = x * x;
fun square(x) = (x : real) * x;
fun square(x) = x * (x : real);
类型推断也用于函数式语言 Miranda、Haskell 和 F#。
Type inference is also used in the functional languages Miranda, Haskell, and F#.
ML 选择控制流构造与命令式语言的构造类似。它具有以下一般形式:
The ML selection control flow construct is similar to that of the imperative languages. It has the following general form:
if expression then then_expression else else_expressionif expression then then_expression else else_expression
第一个表达式的计算结果必须是布尔值。
The first expression must evaluate to a Boolean value.
Scheme 的条件表达式可以出现在 ML 中的函数定义级别。在 Scheme 中,函数COND用于确定给定参数的值,该值又指定 返回的值COND。在 ML 中,可以针对给定参数的不同形式定义函数执行的计算。此功能旨在模仿数学中条件函数定义的形式和含义。在 ML 中,定义函数返回值的特定表达式是通过针对给定参数进行模式匹配来选择的。例如,如果不使用这种模式匹配,则可以按如下方式编写用于计算阶乘的函数:
The conditional expressions of Scheme can appear at the function definition level in ML. In Scheme, the COND function is used to determine the value of the given parameter, which in turn specifies the value returned by COND. In ML, the computation performed by a function can be defined for different forms of the given parameter. This feature is meant to mimic the form and meaning of conditional function definitions in mathematics. In ML, the particular expression that defines the return value of a function is chosen by pattern matching against the given parameter. For example, without using this pattern matching, a function to compute factorial could be written as follows:
fun fact(n : int): int = if n <= 1 then 1
else n * fact(n - 1);fun fact(n : int): int = if n <= 1 then 1
else n * fact(n - 1);
可以使用参数模式匹配来编写函数的多个定义。取决于参数形式的不同函数定义由或符号 ( |) 分隔。例如,使用模式匹配,阶乘函数可以编写如下:
Multiple definitions of a function can be written using parameter pattern matching. The different function definitions that depend on the form of the parameter are separated by an OR symbol (|). For example, using pattern matching, the factorial function could be written as follows:
fun fact(0) = 1
| fact(1) = 1
| fact(n : int): int = n * fact(n - 1);
fun fact(0) = 1
| fact(1) = 1
| fact(n : int): int = n * fact(n - 1);
如果fact使用实际参数 调用0,则使用第一个定义;如果实际参数为 1,则使用第二个定义;如果发送int的值既不是0也不是1,则使用第三个定义。
If fact is called with the actual parameter 0, the first definition is used; if the actual parameter is 1, the second definition is used; if an int value that is neither 0 nor 1 is sent, the third definition is used.
如第6章 所述,ML 支持列表和列表操作。回想一下hd,、tl和::是 Scheme 的CAR、CDR和的 ML 版本CONS。
As discussed in Chapter 6, ML supports lists and list operations. Recall that hd, tl, and :: are ML’s versions of Scheme’s CAR, CDR, and CONS.
由于模式化函数参数的可用性, ML 中的hd和tl函数使用频率远低于Scheme 中的CAR和CDR。例如,在形式参数中,表达式
Because of the availability of patterned function parameters, the hd and tl functions are much less frequently used in ML than CAR and CDR are used in Scheme. For example, in a formal parameter, the expression
(h :: t)(h :: t)
实际上是两个形式参数,即给定列表参数的头和尾,而对应的单个实际参数是一个列表。例如,可以使用以下函数计算给定列表中元素的数量:
is actually two formal parameters, the head and tail of given list parameter, while the single corresponding actual parameter is a list. For example, the number of elements in a given list can be computed with the following function:
fun length([]) = 0
| length(h :: t) = 1 + length(t);
fun length([]) = 0
| length(h :: t) = 1 + length(t);
作为这些概念的另一个例子,考虑一下该append函数,它执行 Schemeappend函数所做的事情:
As another example of these concepts, consider the append function, which does what the Scheme append function does:
fun append([], lis2) = lis2
| append(h :: t, lis2) = h :: append(t, lis2);
fun append([], lis2) = lis2
| append(h :: t, lis2) = h :: append(t, lis2);
此函数中的第一个情况处理函数被调用时使用空列表作为第一个参数的情况。当初始调用具有非空的第一个参数时,这种情况也会终止递归。函数的第二个情况将第一个参数列表分解为头部和尾部(hd和tl)。头部被CONS添加到递归调用的结果中,该调用使用尾部作为其第一个参数。
The first case in this function handles the situation of the function being called with an empty list as the first parameter. This case also terminates the recursion when the initial call has a nonempty first parameter. The second case of the function breaks the first parameter list into its head and tail (hd and tl). The head is CONSed onto the result of the recursive call, which uses the tail as its first parameter.
在 ML 中,名称通过以下形式的值声明语句与值绑定
In ML, names are bound to values with value declaration statements of the form
valnew_name=表达式;
valnew_name=expression;
例如,
For example,
val distance = time * speed;val distance = time * speed;
不要以为这个语句与命令式语言中的赋值语句完全一样,因为它不是。该val语句将名称绑定到值,但名称不能稍后重新绑定到新值。嗯,从某种意义上说,它可以。实际上,如果您使用第二个val语句重新绑定名称,它会在评估环境中产生一个与名称的先前版本无关的新条目。事实上,在新的绑定之后,旧的评估环境条目(对于先前的绑定)不再可见。此外,新绑定的类型不必与先前绑定的类型相同。val语句没有副作用。它们只是将名称添加到当前求值环境并将其绑定到值。
Do not get the idea that this statement is exactly like the assignment statements in the imperative languages, for it is not. The val statement binds a name to a value, but the name cannot be later rebound to a new value. Well, in a sense it can. Actually, if you do rebind a name with a second val statement, it causes a new entry in the evaluation environment that is not related to the previous version of the name. In fact, after the new binding, the old evaluation environment entry (for the previous binding) is no longer visible. Also, the type of the new binding need not be the same as that of the previous binding. val statements do not have side effects. They simply add a name to the current evaluation environment and bind it to a value.
valis 在表达式中的正常用法let。8考虑以下示例:
The normal use of val is in a let expression.8 Consider the following example:
let val radius = 2.7
val pi = 3.14159
in pi * radius * radius
end;
let val radius = 2.7
val pi = 3.14159
in pi * radius * radius
end;
ML 包含几个常用于函数式编程的高阶函数。其中包括一个用于列表的过滤函数,filter它以谓词函数作为参数。谓词函数通常以 lambda 表达式的形式给出,在 ML 中,其定义与函数完全相同,只是使用fn保留字代替fun,当然 lambda 表达式是无名的。filter返回一个以列表为参数的函数。它使用谓词测试列表的每个元素。谓词返回 true 的每个元素都将添加到新列表中,这是函数的返回值。考虑以下的用法filter:
ML includes several higher-order functions that are commonly used in functional programming. Among these are a filtering function for lists, filter, which takes a predicate function as its parameter. The predicate function is often given as a lambda expression, which in ML is defined exactly like a function, except with the fn reserved word, instead of fun, and of course the lambda expression is nameless. filter returns a function that takes a list as a parameter. It tests each element of the list with the predicate. Each element on which the predicate returns true is added to a new list, which is the return value of the function. Consider the following use of filter:
filter(fn(x) => x < 100, [25, 1, 50, 711, 100, 150, 27,
161, 3]);
filter(fn(x) => x < 100, [25, 1, 50, 711, 100, 150, 27,
161, 3]);
此应用程序将返回[25, 1, 50, 27, 3]。
This application would return [25, 1, 50, 27, 3].
该map函数采用单个参数,即函数。结果函数采用列表作为参数。它将其函数应用于列表的每个元素,并返回这些应用的结果列表。请考虑以下代码:
The map function takes a single parameter, which is a function. The resulting function takes a list as a parameter. It applies its function to each element of the list and returns a list of the results of those applications. Consider the following code:
fun cube x = x * x * x;
val cubeList = map cube;
val newList = cubeList [1, 3, 5];
fun cube x = x * x * x;
val cubeList = map cube;
val newList = cubeList [1, 3, 5];
newList执行后,的值为[1, 27, 125]。可以通过将 cube 函数定义为 lambda 表达式来更简单地完成此操作,如下所示:
After execution, the value of newList is [1, 27, 125]. This could be done more simply by defining the cube function as a lambda expression, as in the following:
val newList = map (fn x => x * x * x, [1, 3, 5]);val newList = map (fn x => x * x * x, [1, 3, 5]);
ML 有一个用于组合两个函数的二元运算符o(小写的“oh”)。例如,要构建一个函数h,该函数首先应用函数f,然后将函数g应用于从 中返回的值f,我们可以使用以下命令:
ML has a binary operator for composing two functions, o (a lowercase “oh”). For example, to build a function h that first applies function f and then applies function g to the returned value from f, we could use the following:
val h = g o f;val h = g o f;
严格来说,ML 函数只接受一个参数。当一个函数定义有多个参数时,ML 会将这些参数视为一个元组,尽管通常用于界定元组值的括号是可选的。分隔参数(元组元素)的逗号是必需的。
Strictly speaking, ML functions take a single parameter. When a function is defined with more than one parameter, ML considers the parameters to be a tuple, even though the parentheses that normally delimit a tuple value are optional. The commas that separate the parameters (tuple elements) are required.
柯里化的过程是将一个具有多个参数的函数替换为一个具有一个参数的函数,该函数返回一个采用初始函数的其他参数的函数。
The process of currying replaces a function with more than one parameter with a function with one parameter that returns a function that takes the other parameters of the initial function.
接受多个参数的 ML 函数可以通过省略参数之间的逗号(以及分隔括号)以柯里化形式定义。9例如,我们可以有以下内容:
ML functions that take more than one parameter can be defined in curried form by leaving out the commas between the parameters (and the delimiting parentheses).9 For example, we could have the following:
fun add a b = a + b;fun add a b = a + b;
虽然这似乎定义了一个具有两个参数的函数,但实际上它定义了一个仅具有一个参数的函数。该add函数接受一个整数参数(a)并返回一个也接受整数参数(b)的函数。调用此函数还会排除参数之间的逗号,如下所示:
Although this appears to define a function with two parameters, it actually defines one with just one parameter. The add function takes an integer parameter (a) and returns a function that also takes an integer parameter (b). A call to this function also excludes the commas between the parameters, as in the following:
add 3 5;add 3 5;
正如预期,此调用add返回。8
This call to add returns 8, as expected.
柯里化函数很有趣也很有用,因为可以通过部分求值从它们构造新函数。部分求值意味着使用一个或多个最左边的形式参数的实际参数来求值函数。例如,我们可以定义一个新函数,如下所示:
Curried functions are interesting and useful because new functions can be constructed from them by partial evaluation. Partial evaluation means that the function is evaluated with actual parameters for one or more of the leftmost formal parameters. For example, we could define a new function as follows:
fun add5 x = add 5 x;fun add5 x = add 5 x;
该add5函数接受实际参数5,并以其第一个形式参数的值作为add函数的值。它返回一个添加到其单个参数的函数,如下所示:55
The add5 function takes the actual parameter 5 and evaluates the add function with 5 as the value of its first formal parameter. It returns a function that adds 5 to its single parameter, as in the following:
val num = add5 10;val num = add5 10;
现在的值num是15。我们可以从柯里化函数中创建任意数量的新函数add,以将任何特定数字添加到给定参数。
The value of num is now 15. We could create any number of new functions from the curried function add to add any specific number to a given parameter.
柯里化函数也可以用 Scheme、Haskell 和 F# 编写。考虑以下 Scheme 函数:
Curried functions also can be written in Scheme, Haskell, and F#. Consider the following Scheme function:
(DEFINE (add x y) (+ x y))(DEFINE (add x y) (+ x y))
其柯里化版本如下:
A curried version of this would be as follows:
DEFINE (add y) (LAMBDA (x) (+ y x)))DEFINE (add y) (LAMBDA (x) (+ y x)))
可以这样调用:
This can be called as follows:
((add 3) 4)((add 3) 4)
ML 具有枚举类型、数组和元组。ML 还具有异常处理和用于实现抽象数据类型的模块功能。
ML has enumerated types, arrays, and tuples. ML also has exception handling and a module facility for implementing abstract data types.
机器学习对编程语言的发展产生了重大影响。对于语言研究人员来说,它已成为研究最多的语言之一。此外,它还催生了几种后续语言,其中包括 Haskell、Caml、OCaml 和 F#。
ML has had a significant impact on the evolution of programming languages. For language researchers, it has become one of the most studied languages. Furthermore, it has spawned several subsequent languages, among them Haskell, Caml, OCaml, and F#.
Haskell ( Thompson, 1999 ) 与 ML 相似,因为它们使用类似的语法、具有静态作用域、强类型并且使用相同的类型推断方法。Haskell 有三个特点使其有别于 ML:首先,Haskell 中的函数可以重载(ML 中的函数不能)。其次,Haskell 使用非严格语义,而 ML(以及大多数其他编程语言)使用严格语义。第三,Haskell 是一种纯函数式编程语言,这意味着它没有具有副作用的表达式或语句,而 ML 允许一些副作用(例如,ML 具有可变数组)。本节后面将进一步讨论非严格语义和函数重载。
Haskell (Thompson, 1999) is similar to ML in that it uses a similar syntax, is static scoped, is strongly typed, and uses the same type inferencing method. There are three characteristics of Haskell that set it apart from ML: First, functions in Haskell can be overloaded (functions in ML cannot). Second, nonstrict semantics are used in Haskell, whereas in ML (and most other programming languages) strict semantics are used. Third, Haskell is a pure functional programming language, meaning it has no expressions or statements that have side effects, whereas ML allows some side effects (for example, ML has mutable arrays). Both nonstrict semantics and function overloading are further discussed later in this section.
本节中的代码使用 Haskell 1.4 版本编写。
The code in this section is written in version 1.4 of Haskell.
考虑以下阶乘函数的定义,它对其参数使用模式匹配:
Consider the following definition of the factorial function, which uses pattern matching on its parameters:
fact 0 = 1
fact 1 = 1
fact n = n * fact (n - 1)
fact 0 = 1
fact 1 = 1
fact n = n * fact (n - 1)
请注意此定义与第15.7节 中的 ML 版本在语法上的差异。首先,没有保留字来引入函数定义(fun在 ML 中)。其次,函数的替代定义(具有不同的形式参数)都具有相同的外观。
Note the differences in syntax between this definition and its ML version in Section 15.7. First, there is no reserved word to introduce the function definition (fun in ML). Second, alternative definitions of functions (with different formal parameters) all have the same appearance.
使用模式匹配,我们可以定义一个函数来计算第个n斐波那契数:
Using pattern matching, we can define a function for computing the nth Fibonacci number with the following:
fib 0 = 1
fib 1 = 1
fib (n + 2) = fib (n + 1) + fib n
fib 0 = 1
fib 1 = 1
fib (n + 2) = fib (n + 1) + fib n
可以在函数定义行中添加保护,以指定可以应用该定义的环境。例如,
Guards can be added to lines of a function definition to specify the circumstances under which the definition can be applied. For example,
fact n
| n == 0 = 1
| n == 1 = 1
| n > 1 = n * fact(n - 1)
fact n
| n == 0 = 1
| n == 1 = 1
| n > 1 = n * fact(n - 1)
阶乘的这个定义比前一个定义更精确,因为它将实际参数值的范围限制在它适用的范围内。这种形式的函数定义称为条件表达式,因为它基于数学表达式。
This definition of factorial is more precise than the previous one, as it restricts the range of actual parameter values to those for which it works. This form of a function definition is called a conditional expression, after the mathematical expressions on which it is based.
可以otherwise作为条件表达式中的最后一个条件出现,其语义很明显。例如,
An otherwise can appear as the last condition in a conditional expression, with the obvious semantics. For example,
sub n
| n < 10 = 0
| n > 100 = 2
| otherwise = 1
sub n
| n < 10 = 0
| n > 100 = 2
| otherwise = 1
Notice the similarity between the guards here and the guarded commands discussed in Chapter 8.
考虑以下函数定义,其目的与第 15.7节 中相应的 ML 函数相同:
Consider the following function definition, whose purpose is the same as the corresponding ML function in Section 15.7:
square x = x * xsquare x = x * x
然而,在这种情况下,由于 Haskell 支持多态性,该函数可以接受任何数字类型的参数。
In this case, however, because of Haskell’s support for polymorphism, this function can take a parameter of any numeric type.
与 ML 一样,列表在 Haskell 中写在括号中,例如
As with ML, lists are written in brackets in Haskell, as in
colors = [”blue”, ”green”, ”red”, ”yellow”]colors = [”blue”, ”green”, ”red”, ”yellow”]
Haskell 包含一系列列表运算符。例如,列表可以用 连接++,:用作 的中缀版本CONS,并..用于指定列表中的算术级数。例如,
Haskell includes a collection of list operators. For example, lists can be catenated with ++, : serves as an infix version of CONS, and .. is used to specify an arithmetic series in a list. For example,
5:[2, 7, 9]结果[5, 2, 7, 9][1, 3..11]结果[1, 3, 5, 7, 9, 11][1, 3, 5] ++ [2, 4, 6]结果[1, 3, 5, 2, 4, 6]
5:[2, 7, 9]results in[5, 2, 7, 9][1, 3..11]results in[1, 3, 5, 7, 9, 11][1, 3, 5] ++ [2, 4, 6]results in[1, 3, 5, 2, 4, 6]
请注意,该:运算符就像 ML 的::运算符一样。10使用和模式:匹配,我们可以定义一个简单的函数来计算给定数字列表的乘积:
Notice that the : operator is just like ML’s :: operator.10 Using : and pattern matching, we can define a simple function to compute the product of a given list of numbers:
product [] = 1
product (a:x) = a * product x
product [] = 1
product (a:x) = a * product x
使用product,我们可以以更简单的形式编写阶乘函数
Using product, we can write a factorial function in the simpler form
fact n = product [1..n]fact n = product [1..n]
Haskell 包含一个let与 MLlet和类似的结构val。例如,我们可以写
Haskell includes a let construct that is similar to ML’s let and val. For example, we could write
quadratic_root a b c =
let minus_b_over_2a = - b / (2.0 * a)
root_part_over_2a =
sqrt(b ^ 2 - 4.0 * a * c) / (2.0 * a)
in
minus_b_over_2a - root_part_over_2a,
minus_b_over_2a + root_part_over_2a
quadratic_root a b c =
let minus_b_over_2a = - b / (2.0 * a)
root_part_over_2a =
sqrt(b ^ 2 - 4.0 * a * c) / (2.0 * a)
in
minus_b_over_2a - root_part_over_2a,
minus_b_over_2a + root_part_over_2a
Haskell 的列表推导式在第6章 中介绍过。例如,考虑以下列表推导式的示例:
Haskell’s list comprehensions were introduced in Chapter 6. For example, consider the following example of a list comprehension:
[n * n * n | n <- [1..50]][n * n * n | n <- [1..50]]
这定义了从 1 到 50 的数字立方的列表。它读作“从 1 到 50 的范围内取的所有n*n*n这样的列表”。在这种情况下,限定符是生成器n的形式。它生成从到 的数字。换句话说150在某些情况下,限定符采用布尔表达式的形式,称为测试。此符号可用于描述执行许多操作的算法,例如查找列表的排列和对列表进行排序。例如,考虑以下函数,当给定一个数字时,它n会返回其所有因子的列表:
This defines a list of the cubes of the numbers from 1 to 50. It is read as “a list of all n*n*n such that n is taken from the range of 1 to 50.” In this case, the qualifier is in the form of a generator. It generates the numbers from 1 to 50. In other cases, the qualifiers are in the form of Boolean expressions and they are called tests. This notation can be used to describe algorithms for doing many things, such as finding permutations of lists and sorting lists. For example, consider the following function, which when given a number n returns a list of all its factors:
factors n = [ i | i <- [1..n ‘div‘ 2], n ‘mod‘ i == 0]factors n = [ i | i <- [1..n ‘div‘ 2], n ‘mod‘ i == 0]
中的列表推导式factors会创建一个数字列表,每个数字都临时绑定到名称i,范围从1到n/2,其中n ` mod`为零。这确实是给定数字的因数的非常精确和简短的定义。和 i周围的反引号(向后的撇号)用于指定这些函数的中缀用法。当以函数符号调用它们时,如 中所示,反引号不被使用。divmoddiv n 2
The list comprehension in factors creates a list of numbers, each temporarily bound to the name i, ranging from 1 to n/2, such that n `mod` i is zero. This is indeed a very exacting and short definition of the factors of a given number. The backticks (backward apostrophes) surrounding div and mod are used to specify the infix use of these functions. When they are called in functional notation, as in div n 2, the backticks are not used.
接下来,考虑以下快速排序算法实现中体现的 Haskell 的简洁性:
Next, consider the concision of Haskell illustrated in the following implementation of the quicksort algorithm:
sort [] = []
sort (h:t) = sort [b | b <- t, b <- h]
++ [h] ++
sort [b | b <- t, b > h]
sort [] = []
sort (h:t) = sort [b | b <- t, b <- h]
++ [h] ++
sort [b | b <- t, b > h]
在这个程序中,小于或等于列表头的列表元素集合被排序并与头元素连接,然后大于列表头的元素集合被排序并连接到先前的结果。这个快速排序的定义比用命令式语言编写的相同算法要短得多,也简单得多。
In this program, the set of list elements that are smaller or equal to the list head are sorted and catenated with the head element, then the set of elements that are greater than the list head are sorted and catenated onto the previous result. This definition of quicksort is significantly shorter and simpler than the same algorithm coded in an imperative language.
如果一种编程语言要求对所有实际参数进行完全求值,那么它就是严格的,这可以确保函数的值不依赖于参数的求值顺序。如果没有严格的要求,那么它就是非严格的。非严格语言与严格语言相比有几个明显的优势。首先,非严格语言通常更高效,因为避免了某些求值。11其次,非严格语言可以实现一些严格语言无法实现的有趣功能。其中包括无限列表。非严格语言可以使用一种称为惰性求值的求值形式,这意味着只有当需要表达式的值时才对其进行求值。
A programming language is strict if it requires all actual parameters to be fully evaluated, which ensures that the value of a function does not depend on the order in which the parameters are evaluated. A language is nonstrict if it does not have the strict requirement. Nonstrict languages can have several distinct advantages over strict languages. First, nonstrict languages are generally more efficient, because some evaluation is avoided.11 Second, some interesting capabilities are possible with nonstrict languages that are not possible with strict languages. Among these are infinite lists. Nonstrict languages can use an evaluation form called lazy evaluation, which means that expressions are evaluated only if and when their values are needed.
回想一下,在 Scheme 中,函数的参数在调用函数之前会进行完全求值,因此它具有严格的语义。惰性求值意味着只有当实际参数的值对于求值函数是必需时,才会对其进行求值。因此,如果函数有两个参数,但在函数的特定执行中未使用第一个参数,则不会求值为该执行传递的实际参数。此外,如果在函数执行中只需求值实际参数的一部分,则其余部分将保留最后,实际参数仅被求值一次(如果有的话),即使同一个实际参数在函数调用中出现多次。
Recall that in Scheme the parameters to a function are fully evaluated before the function is called, so it has strict semantics. Lazy evaluation means that an actual parameter is evaluated only when its value is necessary to evaluate the function. So, if a function has two parameters, but on a particular execution of the function the first parameter is not used, the actual parameter passed for that execution will not be evaluated. Furthermore, if only a part of an actual parameter must be evaluated for an execution of the function, the rest is left unevaluated. Finally, actual parameters are evaluated only once, if at all, even if the same actual parameter appears more than once in a function call.
如前所述,惰性求值允许定义无限的数据结构。例如,考虑以下内容:
As stated previously, lazy evaluation allows one to define infinite data structures. For example, consider the following:
positives = [0..]
evens = [2, 4..]
squares = [n * n | n <- [0..]]
positives = [0..]
evens = [2, 4..]
squares = [n * n | n <- [0..]]
当然,没有计算机能够真正表示这些列表中的所有数字,但如果使用惰性求值,这并不妨碍它们的使用。例如,如果我们想知道某个数字是否是完全平方数,我们可以squares使用成员函数检查列表。假设我们有一个名为的谓词函数member,用于确定给定原子是否包含在给定列表中。那么我们可以像下面这样使用它
Of course, no computer can actually represent all of the numbers of these lists, but that does not prevent their use if lazy evaluation is used. For example, if we wanted to know if a particular number was a perfect square, we could check the squares list with a membership function. Suppose we had a predicate function named member that determined whether a given atom is contained a given list. Then we could use it as in
member 16 squaresmember 16 squares
这将返回True。将对squares定义进行求值,直到16找到 。该member函数需要仔细编写。具体来说,假设它定义如下:
which would return True. The squares definition would be evaluated until the 16 was found. The member function would need to be carefully written. Specifically, suppose it were defined as follows:
member b [] = False
member b (a:x)= (a == b) || member b x
member b [] = False
member b (a:x)= (a == b) || member b x
此定义的第二行将第一个参数分为头部和尾部。如果头部与要搜索的元素匹配(b),或者使用列表尾部的递归调用返回 ,则其返回值为 true True。
The second line of this definition breaks the first parameter into its head and tail. Its return value is true if either the head matches the element for which it is searching (b) or if the recursive call with the tail of the list returns True.
只有当给定的数字是完全平方数时,这个定义member才能正确工作。如果不是,将继续生成平方数,或者直到达到内存限制,在列表中寻找给定的数字。以下函数执行有序列表的成员资格测试,如果找到大于搜索数字的数字,则放弃搜索并返回。12squaressquaresFalse
This definition of member would work correctly with squares only if the given number were a perfect square. If not, squares would keep generating squares forever, or until some memory limitation was reached, looking for the given number in the list. The following function performs the membership test of an ordered list, abandoning the search and returning False if a number greater than the searched-for number is found.12
member2 n (m:x)
| m < n = member2 n x
| m == n = True
| otherwise = False
member2 n (m:x)
| m < n = member2 n x
| m == n = True
| otherwise = False
惰性求值有时提供模块化工具。假设程序中有一个对函数的调用f,参数f是函数的返回值g。13因此,我们有f(g(x))。进一步假设g每次产生大量数据,并且f必须处理这些数据,一点一点地。在传统的命令式语言中,g将在整个输入上运行,生成一个输出文件。然后f使用文件作为输入运行。这种方法需要写入和读取文件的时间,以及文件的存储。使用惰性求值,f和的执行g将隐式地紧密同步。函数g将只执行足够长的时间以产生足够的数据来f开始处理。当f准备好接收更多数据时,g将重新启动以产生更多数据,而f等待。如果f在未获得所有g输出的情况下终止,g则中止,从而避免无用的计算。此外,g不必是终止函数,可能是因为它产生无限的输出。g将在终止时被迫终止f。因此,在惰性求值下,g运行尽可能少。此求值过程支持将程序模块化为生成器单元和选择器单元,其中生成器产生大量可能的结果,而选择器选择适当的子集。
Lazy evaluation sometimes provides a modularization tool. Suppose that in a program there is a call to function f and the parameter to f is the return value of a function g.13 So, we have f(g(x)). Further suppose that g produces a large amount of data, a little at a time, and that f must then process this data, a little at a time. In a conventional imperative language, g would run on the whole input producing a file of its output. Then f would run using the file as its input. This approach requires the time to both write and read the file, as well as the storage for the file. With lazy evaluation, the executions of f and g implicitly would be tightly synchronized. Function g will execute only long enough to produce enough data for f to begin its processing. When f is ready for more data, g will be restarted to produce more, while f waits. If f terminates without getting all of g’s output, g is aborted, thereby avoiding useless computation. Also, g need not be a terminating function, perhaps because it produces an infinite amount of output. g will be forced to terminate when f terminates. So, under lazy evaluation, g runs as little as possible. This evaluation process supports the modularization of programs into generator units and selector units, where the generator produces a large number of possible results and the selector chooses the appropriate subset.
惰性求值并非没有代价。如果这种表达能力和灵活性是免费的,那肯定会令人惊讶。在这种情况下,代价是语义要复杂得多,这会导致执行速度慢得多。
Lazy evaluation is not without its costs. It would certainly be surprising if such expressive power and flexibility were free. In this case, the cost is in a far more complicated semantics, which results in much slower speed of execution.
F# 是一种 .NET 函数式编程语言,其核心基于 OCaml,它是 ML 和 Haskell 的后代。虽然它本质上是一种函数式语言,但它包含命令式功能并支持面向对象编程。F# 最重要的特性之一是它具有功能齐全的 IDE、支持命令式、面向对象和函数式编程的大量实用程序库,并且与一组非函数式语言(所有 .NET 语言)具有互操作性。
F# is a .NET functional programming language whose core is based on OCaml, which is a descendant of ML and Haskell. Although it is fundamentally a functional language, it includes imperative features and supports object-oriented programming. One of the most important characteristics of F# is that it has a full-featured IDE, an extensive library of utilities that supports imperative, object-oriented, and functional programming, and has interoperability with a collection of nonfunctional languages (all of the .NET languages).
F# 是一流的 .NET 语言。这意味着 F# 程序可以以各种方式与其他 .NET 语言交互。例如,其他语言的程序可以使用和子类化 F# 类,反之亦然。此外,F# 程序可以访问所有 .NET Framework API。F# 实现可从 Microsoft 免费获得。Visual Studio 也支持它。(http:/
F# is a first-class .NET language. This means that F# programs can interact in every way with other .NET languages. For example, F# classes can be used and subclassed by programs in other languages, and vice versa. Furthermore, F# programs have access to all of the .NET Framework APIs. The F# implementation is available free from Microsoft (http:/. It is also supported by Visual Studio.
F# 包含多种数据类型。其中包括元组(如 Python 和函数式语言 ML 和 Haskell 中的元组)、列表、可区分联合(ML 联合的扩展)和记录(如 ML 中的记录,它们类似于元组,只是组件有名称)。F# 既有可变数组,也有不可变数组。
F# includes a variety of data types. Among these are tuples, like those of Python and the functional languages ML and Haskell, lists, discriminated unions, an expansion of ML’s unions, and records, like those of ML, which are like tuples except the components are named. F# has both mutable and immutable arrays.
回想一下第6章 ,F# 的列表与 ML 的列表类似,只是元素之间用分号分隔,hd并且tl必须作为的方法调用List。
Recall from Chapter 6, that F#’s lists are similar to those of ML, except that the elements are separated by semicolons and hd and tl must be called as methods of List.
F# 支持序列值,它们是来自 .NET 命名空间 的类型System.Collections.Generic.IEnumerable。在 F# 中,序列缩写为seq<type>,其中 <type> 表示泛型的类型。例如,类型seq<int>是整数值的序列。可以使用生成器创建序列值,并且可以对其进行迭代。最简单的序列是使用范围表达式生成的,如下例所示:
F# supports sequence values, which are types from the .NET namespace System.Collections.Generic.IEnumerable. In F#, sequences are abbreviated as seq<type>, where <type> indicates the type of the generic. For example, the type seq<int> is a sequence of integer values. Sequence values can be created with generators and they can be iterated. The simplest sequences are generated with range expressions, as in the following example:
let x = seq {1..4};;let x = seq {1..4};;
在 F# 的示例中,我们假设使用交互式解释器,它要求每个语句末尾有两个分号。上面的表达式生成seq[1; 2; 3; 4]。 (列表和序列元素以分号分隔。)序列的生成是惰性的;例如,以下定义y为一个非常长的序列,但只生成所需的元素。为了显示,只生成前四个。
In the examples of F#, we assume that the interactive interpreter is used, which requires the two semicolons at the end of each statement. The expression above generates seq[1; 2; 3; 4]. (List and sequence elements are separated by semicolons.) The generation of a sequence is lazy; for example, the following defines y to be a very long sequence, but only the needed elements are generated. For display, only the first four are generated.
let y = seq {0..100000000};;
y;;
val it: seq<int> = seq[0; 1; 2; 3;...]
let y = seq {0..100000000};;
y;;
val it: seq<int> = seq[0; 1; 2; 3;...]
上面第一行定义了y;第二行请求显示的值y;第三行是 F# 交互式解释器的输出。
The first line above defines y; the second line requests that the value of y be displayed; the third is the output of the F# interactive interpreter.
整数序列定义的默认步长为1,但可以通过将其包含在范围规范的中间来设置,如下例所示:
The default step size for integer sequence definitions is 1, but it can be set by including it in the middle of the range specification, as in the following example:
seq {1..2..7};;seq {1..2..7};;
这将生成seq [1; 3; 5; 7]。
This generates seq [1; 3; 5; 7].
可以使用构造来迭代序列的值for-in,如下例所示:
The values of a sequence can be iterated with a for-in construct, as in the following example:
let seq1 = seq {0..3..11};;
for value in seq1 do printfn ”value = %d” value;;
let seq1 = seq {0..3..11};;
for value in seq1 do printfn ”value = %d” value;;
这将产生以下内容:
This produces the following:
value = 0
value = 3
value = 6
value = 9
value = 0
value = 3
value = 6
value = 9
迭代器还可用于创建序列,如下例所示:
Iterators can also be used to create sequences, as in the following example:
let cubes = seq {for i in 1..5 -> (i, i * i * i)};;let cubes = seq {for i in 1..5 -> (i, i * i * i)};;
这将生成以下元组列表:
This generates the following list of tuples:
seq [(1, 1); (2, 8); (3, 27); (4, 64); (5, 125)]seq [(1, 1); (2, 8); (3, 27); (4, 64); (5, 125)]
使用迭代器生成集合是列表理解的一种形式。
This use of iterators to generate collections is a form of list comprehension.
排序还可用于生成列表和数组,尽管在这些情况下生成不是惰性的。事实上,F# 中列表和序列之间的主要区别在于序列是惰性的,因此可以是无限的,而列表不是惰性的。列表全部存储在内存中。序列并非如此。
Sequencing can also be used to generate lists and arrays, although in these cases the generation is not lazy. In fact, the primary difference between lists and sequences in F# is that sequences are lazy, and thus can be infinite, whereas lists are not lazy. Lists are in their entirety stored in memory. That is not the case with sequences.
F# 的函数与 ML 和 Haskell 的函数类似。如果命名,则使用let语句定义。如果未命名,即从技术上讲它们是 lambda 表达式,则使用保留字定义fun。以下 lambda 表达式说明了它们的语法:
The functions of F# are similar to those of ML and Haskell. If named, they are defined with let statements. If unnamed, which means technically they are lambda expressions, they are defined with the fun reserved word. The following lambda expression illustrates their syntax:
(fun a b -> a / b)(fun a b -> a / b)
let请注意,用 定义的名称和用 定义的不带参数的函数之间没有区别let。
Note that there is no difference between a name defined with let and a function without parameters defined with let.
缩进用于显示函数定义的范围。例如,考虑以下函数定义:
Indentation is used to show the extent of a function definition. For example, consider the following function definition:
let f =
let pi = 3.14159
let twoPi = 2.0 * pi
twoPi;;
let f =
let pi = 3.14159
let twoPi = 2.0 * pi
twoPi;;
请注意,F# 与 ML 类似,不会强制转换数值,因此如果2在倒数第二行而不是中使用这个函数,2.0则会报告错误。
Note that F#, like ML, does not coerce numeric values, so if this function used 2 in the second-last line, rather than 2.0, an error would be reported.
如果函数是递归的,则保留字rec必须在其定义中位于其名称之前。以下是 F# 版本的阶乘:
If a function is recursive, the reserved word rec must precede its name in its definition. Following is an F# version of factorial:
let rec factorial x =
if x <= 1 then 1
else n * factorial(n - 1)
let rec factorial x =
if x <= 1 then 1
else n * factorial(n - 1)
函数中定义的名称可以超出范围,这意味着它们可以被重新定义,从而结束它们以前的范围。例如,我们可以有以下内容:
Names defined in functions can be outscoped, which means they can be redefined, which ends their former scope. For example, we could have the following:
let x4 x =
let x = x * x
let x = x * x
x;;
let x4 x =
let x = x * x
let x = x * x
x;;
let在此函数中,函数主体中的第一个x4创建了 的新版本x,将其定义为参数 的平方值x。这终止了参数的作用域。因此,let函数主体中的第二个在其右侧使用 newx并创建了 的另一个版本,从而终止了前一个 中创建的x的作用域。xlet
In this function, the first let in the body of the x4 function creates a new version of x, defining it to have the value of the square of the parameter x. This terminates the scope of the parameter. So, the second let in the function body uses the new x in its right side and creates yet another version of x, thereby terminating the scope of the x created in the previous let.
F# 中有两个重要的函数运算符,管道 ( |>) 和函数组合 ( >>)。管道运算符是一个二元运算符,将其左操作数(表达式)的值发送到函数调用的最后一个参数(右操作数)。它用于将函数调用链接在一起,同时将正在处理的数据流送到每个调用。考虑以下示例代码,它使用高阶函数filter和map:
There are two important functional operators in F#, pipeline (|>) and function composition (>>). The pipeline operator is a binary operator that sends the value of its left operand, which is an expression, to the last parameter of the function call, which is the right operand. It is used to chain together function calls while flowing the data being processed to each call. Consider the following example code, which uses the high-order functions filter and map:
let myNums = [1; 2; 3; 4; 5]
let evensTimesFive = myNums
|> List.filter (fun n -> n % 2 = 0)
|> List.map (fun n -> 5 * n)
let myNums = [1; 2; 3; 4; 5]
let evensTimesFive = myNums
|> List.filter (fun n -> n % 2 = 0)
|> List.map (fun n -> 5 * n)
函数evensTimesFive以列表 为起点myNums,过滤掉不等于 的数字filter,然后使用map映射一个 lambda 表达式,该表达式将给定列表中的数字乘以五。 的返回evensTimesFive值为[10; 20]。
The evensTimesFive function begins with the list myNums, filters out the numbers that are not even with filter, and uses map to map a lambda expression that multiplies the numbers in a given list by five. The return value of evensTimesFive is [10; 20].
函数组合运算符构建一个函数,将其左操作数应用于给定参数(该参数是一个函数),然后将该函数返回的结果传递给其右操作数(该参数也是一个函数)。因此,F# 表达式(f >> g)x相当于数学表达式g(f(x))。
The function composition operator builds a function that applies its left operand to a given parameter, which is a function, and then passes the result returned from that function to its right operand, which is also a function. So, the F# expression (f >> g)x is equivalent to the mathematical expression g(f(x)).
与 ML 一样,F# 支持柯里化函数和部分求值。第 15.7节 中的 ML 示例可以用 F# 编写如下:
Like ML, F# supports curried functions and partial evaluation. The ML example in Section 15.7 could be written in F# as follows:
let add a b = a + b;;
let add5 = add 5;;
let add a b = a + b;;
let add5 = add 5;;
请注意,与 ML 不同,F# 中形式参数列表的语法对于所有函数都是相同的,因此所有具有多个参数的函数都可以进行柯里化。
Note that, unlike ML, the syntax of the formal parameter list in F# is the same for all functions, so all functions with more than one parameter can be curried.
F# 之所以有趣,有几个原因:首先,它以过去的函数式语言为基础,成为一种函数式语言。其次,它支持当今广泛使用的几乎所有编程方法。第三,它是第一种专为与其他广泛使用的语言互操作而设计的函数式语言。第四,它以 .NET 及其框架为基础,拥有一个精心设计、完善的 IDE 和实用软件库。
F# is interesting for several reasons: First, it builds on the past functional languages as a functional language. Second, it supports virtually all programming methodologies in widespread use today. Third, it is the first functional language that is designed for interoperability with other widely used languages. Fourth, it starts out with an elaborate and well-developed IDE and library of utility software with .NET and its framework.
命令式编程语言通常只为函数式编程提供有限的支持。这种有限的支持导致这些语言很少用于函数式编程。过去命令式语言与函数式编程相关的最重要的限制是缺乏对高阶函数的支持。
Imperative programming languages have typically provided only limited support for functional programming. That limited support has resulted in little use of those languages for functional programming. The most important restriction, related to functional programming, of imperative languages of the past was the lack of support for higher-order functions.
函数式编程越来越受到人们的关注和使用,其中一个迹象是,过去十年来,一些以命令式为主的编程语言开始提供对函数式编程的部分支持。例如,匿名函数(类似于 lambda 表达式)现在已成为 JavaScript、Python、Ruby、Java 和 C# 的一部分。
One indication of the increasing interest and use of functional programming is the partial support for it that has begun to appear over the last decade in programming languages that are primarily imperative. For example, anonymous functions, which are like lambda expressions, are now part of JavaScript, Python, Ruby, Java, and C#.
在 JavaScript 中,命名函数使用以下语法定义:
In JavaScript, named functions are defined with the following syntax:
function name (formal-parameters) {
body
}
function name (formal-parameters) {
body
}
在 JavaScript 中,定义匿名函数的语法相同,只是省略了函数名称。
An anonymous function is defined in JavaScript with the same syntax, except that the name of the function is omitted.
在 C# 中,lambda 表达式是委托的一个实例。它们可以是匿名的,也可以是命名的。匿名 lambda 表达式比匿名方法更简单,因为方法必须定义其参数类型和返回类型,但 lambda 表达式使用 C# 推理过程来避免这些必需品。C# 中未命名 lambda 表达式的语法如下:
In C#, a lambda expression is an instance of a delegate. They can be anonymous or named. An anonymous lambda expression is simpler than an anonymous method because methods must define their parameter types and return type, but lambda expressions use the C# inferencing process to avoid those necessities. The syntax of an unnamed lambda expression in C# is as follows:
参数 => 表达式
parameter(s) => expression
如果有多个参数,则必须将它们括在括号中。如果没有参数,则必须在参数的位置出现空括号。如果系统无法推断参数的类型,则可以在参数前面加上类型名称。从不指定返回值的类型;它始终由 lambda 表达式的上下文推断。表达式可以是单个表达式,也可以是用括号括起来的复合语句。这样的复合语句必须包含一个return语句。
If there is more than one parameter, they must be enclosed in parentheses. If there are no parameters, empty parentheses must appear in the place of the parameters. If the system cannot infer the types of the parameters, they can be preceded by type names. The type of the return value is never specified; it is always inferred by the context of the lambda expression. The expression is either a single expression or a compound statement enclosed in braces. Such a compound statement must include a return statement.
匿名 lambda 表达式的一个常见用途是作为指定以委托为参数的方法的实际参数。例如,C# 有一个用于执行搜索操作的数组方法集合。例如,该FindAll方法查找满足委托给定实例的数组的所有元素,该委托采用数组元素类型的参数并返回布尔值。例如,我们可以有以下内容:
One common use of anonymous lambda expressions is as actual parameters to methods that are specified to take delegates as parameters. For example, C# has a collection of methods for arrays that perform searching operations. For example, the FindAll method finds all of the elements of an array that satisfy a given instance of a delegate that takes a parameter of the type of the array’s elements and returns a Boolean value. For example, we could have the following:
int[] numbers = {-3, 0, 4, 5, 1, 7, -3, -6, -9, 0, 3};
int[] positives = Array.FindAll(numbers, n => n > 0);
// Now, positives is {4, 5, 1, 7, 3}
int[] numbers = {-3, 0, 4, 5, 1, 7, -3, -6, -9, 0, 3};
int[] positives = Array.FindAll(numbers, n => n > 0);
// Now, positives is {4, 5, 1, 7, 3}
C# 中的 Lambda 表达式也可以命名。该语言具有泛型委托,使此类 lambda 表达式的定义变得简单,尽管它们并未涵盖所有可能性。一种常用的泛型委托是Func,它可以接受最多 16 个泛型参数作为 lambda 表达式的参数,再加上一个泛型参数作为返回类型。例如,请考虑以下命名 lambda 表达式及其调用的示例:
Lambda expressions in C# can also be named. The language has generic delegates that make defining such lambda expressions simple, although they do not cover all possibilities. One commonly used generic delegate is Func, which can take up to sixteen generic parameters for the parameters of the lambda expression, plus one more for the return type. For example, consider the following example of a named lambda expression and an invocation of it:
Func<int, int, int> eval1 = (a, b) => 3 * a + (b / 2);
int result = eval1(6, 22);
Func<int, int, int> eval1 = (a, b) => 3 * a + (b / 2);
int result = eval1(6, 22);
C# lambda 表达式可以访问其定义之外的变量。当它们这样做时,被访问的外部变量(称为捕获的变量被扩展,以便在使用 lambda 表达式时它们仍然存在。捕获外部变量的 lambda 表达式是闭包。
C# lambda expressions can access variables defined outside their definitions. When they do, the lifetimes of the accessed external variables, which are called captured variables, are extended so that they still exist when the lambda expression is used. A lambda expression that captures outside variables is a closure.
Lambda 表达式已添加到 Java 8。这些表达式的一般语法与 C# 类似,只是->使用 代替=>。参数的语法、参数类型和返回类型的推断以及带有 的表达式或块return与 C# 相同。在 Java 8 之前,没有方便的方法将代码块传递给方法或让方法返回代码块。14
Lambda expressions were added to Java 8. The general syntax of these expressions is like that of C#, except that -> is used instead of =>. The syntax of parameters, the inferencing of parameter types and the return type, and the expression or block with a return are the same as with C#. Prior to Java 8, there was no convenient way to pass a block of code to a method or have the method return a block of code.14
Python 的 lambda 表达式定义简单的单语句匿名函数。Python 中 lambda 表达式的语法示例如下:
Python’s lambda expressions define simple one-statement anonymous functions. The syntax of a lambda expression in Python is exemplified by the following:
lambda a, b : 2 * a - blambda a, b : 2 * a - b
请注意,形式参数与函数体之间用冒号分隔。
Note that the formal parameters are separated from function body by a colon.
Python 包括高阶函数filter和map。两者都经常使用 lambda 表达式作为其第一个参数。它们的第二个参数是序列类型,并且都返回与第二个参数相同的序列类型。在 Python 中,字符串、列表和元组被视为序列。请考虑以下在 Python 中使用 map 函数的示例:
Python includes the higher-order functions filter and map. Both often use lambda expressions as their first parameter. The second parameter of these is a sequence type, and both return the same sequence type as their second parameter. In Python, strings, lists, and tuples are considered sequences. Consider the following example of using the map function in Python:
map(lambda x: x ** 3, [2, 4, 6, 8])map(lambda x: x ** 3, [2, 4, 6, 8])
此调用返回[8, 64, 216, 512]。
This call returns [8, 64, 216, 512].
Python 还支持部分函数应用。请考虑以下示例:
Python also supports partial function applications. Consider the following example:
from operator import add
add5 = partial (add, 5)
from operator import add
add5 = partial (add, 5)
此处的声明从模块中from导入加法运算符的函数版本,名为。addoperator
The from declaration here imports the functional version of the addition operator, which is named add, from the operator module.
定义后add5,它可以与一个参数一起使用,如下所示:
After defining add5, it can be used with one parameter, as in the following:
add5(15)add5(15)
此调用返回20。
This call returns 20.
As described in Chapter 6, Python includes lists and list comprehensions.
Ruby 的块实际上是发送给方法的子程序,这使得该方法成为高阶子程序。Ruby 块可以使用 转换为子程序对象lambda。例如,考虑以下内容:
Ruby’s blocks are effectively subprograms that are sent to methods, which makes the method a higher-order subprogram. A Ruby block can be converted to a subprogram object with lambda. For example, consider the following:
times = lambda {|a, b| a * b}times = lambda {|a, b| a * b}
以下是使用的示例times:
Following is an example of using times:
x = times.(3, 4)x = times.(3, 4)
这将设置x为12。times可以使用以下方法对对象进行柯里化:
This sets x to 12. The times object can be curried with the following:
times5 = times.curry.(5)times5 = times.curry.(5)
该函数的使用方法如下:
This function can be used as in the following:
x5 = times5.(3)x5 = times5.(3)
这设置x5为15。
This sets x5 to 15.
本节讨论命令式语言和函数式语言之间的一些区别。
This section discusses some of the differences between imperative and functional languages.
函数式语言的语法结构非常简单。Lisp 的列表结构(用于代码和数据)清楚地说明了这一点。命令式语言的语法要复杂得多。这使得它们更难学习和使用。
Functional languages can have a very simple syntactic structure. The list structure of Lisp, which is used for both code and data, clearly illustrates this. The syntax of the imperative languages is much more complex. This makes them more difficult to learn and to use.
函数式语言的语义也比命令式语言的语义更简单。例如,在第 3.5.2 节中给出的命令式循环构造的指称语义描述中,循环从迭代构造转换为递归构造。在纯函数式语言中,这种转换是不必要的,因为其中没有迭代。此外,我们假设第 3 章中所有命令式构造的指称语义描述中都没有表达式副作用。这种限制 对于命令式语言来说是不现实的,因为所有基于 C 的语言都包含表达式副作用。纯函数式语言的指称描述不需要这种限制。
The semantics of functional languages is also simpler than that of the imperative languages. For example, in the denotational semantics description of an imperative loop construct given in Section 3.5.2, the loop is converted from an iterative construct to a recursive construct. This conversion is unnecessary in a pure functional language, in which there is no iteration. Furthermore, we assumed there were no expression side effects in all of the denotational semantic descriptions of imperative constructs in Chapter 3. This restriction is unrealistic for imperative languages, because all of the C-based languages include expression side effects. This restriction is not needed for the denotational descriptions of pure functional languages.
函数式编程社区中的一些人声称,使用函数式编程可以使生产率提高一个数量级,这主要是因为函数式程序的大小只有命令式程序的 10%。虽然对于某些问题领域确实显示了这样的数字,但对于其他问题领域,函数式程序的大小更像是相同问题的命令式解决方案的 25% ( Wadler, 1998 )。这些因素使得函数式编程的支持者可以声称其生产率比命令式编程高出 4 到 10 倍。但是,单凭程序大小并不一定是衡量生产率的良好标准。当然,并非所有源代码行都具有相同的复杂度,生成它们所需的时间也不相同。事实上,由于需要处理变量,命令式程序有许多非常简单的行用于初始化变量和对变量进行细小的更改。
Some in the functional programming community have claimed that the use of functional programming results in an order-of-magnitude increase in productivity, largely due to functional programs being claimed to be only 10 percent as large as their imperative counterparts. While such numbers have been actually shown for certain problem areas, for other problem areas, functional programs are more like 25 percent as large as imperative solutions to the same problems (Wadler, 1998). These factors allow proponents of functional programming to claim productivity advantages over imperative programming of 4 to 10 times. However, program size alone is not necessarily a good measure of productivity. Certainly not all lines of source code have equal complexity, nor do they take the same amount of time to produce. In fact, because of the necessity of dealing with variables, imperative programs have many trivially simple lines for initializing and making small changes to variables.
执行效率是另一个比较的基础。当函数式程序被解释时,它们当然比编译后的命令式程序慢得多。但是,现在大多数函数式语言都有编译器,因此函数式语言和编译后的命令式语言之间的执行速度差异不再那么大。人们可能会说,因为函数式程序比等效的命令式程序小得多,所以它们的执行速度应该比命令式程序快得多。然而,情况往往并非如此,因为函数式语言的一系列语言特性(如惰性求值)对执行效率有负面影响。考虑到函数式和命令式程序的相对效率,可以合理地估计,一个普通的函数式程序的执行时间大约是命令式程序的两倍(Wadler,1998)。这听起来可能是一个显著的差异,这种差异常常会导致人们在给定的应用程序中忽略函数式语言。然而,这种两倍的差异只有在执行速度至关重要的情况下才重要。在许多情况下,执行速度的两倍因素并不重要。例如,考虑许多用命令式语言编写的程序,如用 JavaScript 和 PHP 编写的 Web 软件,它们都是解释型的,因此比同等的编译版本慢得多。对于这些应用程序,执行速度并不是第一要务。
Execution efficiency is another basis for comparison. When functional programs are interpreted, they are of course much slower than their compiled imperative counterparts. However, there are now compilers for most functional languages, so that execution speed disparities between functional languages and compiled imperative languages are no longer so great. One might be tempted to say that because functional programs are significantly smaller than equivalent imperative programs, they should execute much faster than the imperative programs. However, this often is not the case, because of a collection of language characteristics of the functional languages, such as lazy evaluation, that have a negative impact on execution efficiency. Considering the relative efficiency of functional and imperative programs, it is reasonable to estimate that an average functional program will execute in about twice the time of its imperative counterpart (Wadler, 1998). This may sound like a significant difference, one that would often lead one to dismiss the functional languages for a given application. However, this factor-of-two difference is important only in situations where execution speed is of the utmost importance. There are many situations where a factor of two in execution speed is not considered important. For example, consider that many programs written in imperative languages, such as the Web software written in JavaScript and PHP, are interpreted and therefore are much slower than equivalent compiled versions. For these applications, execution speed is not the first priority.
函数式和命令式程序执行效率差异的另一个原因是,命令式语言被设计为在冯·诺依曼架构计算机上高效运行,而函数式语言的设计基于数学函数。这给了命令式语言很大的优势。
Another source of the difference in execution efficiency between functional and imperative programs is the fact that imperative languages were designed to run efficiently on von Neumann architecture computers, while the design of functional languages is based on mathematical functions. This gives the imperative languages a large advantage.
函数式语言在可读性方面具有潜在优势。在许多命令式程序中,处理变量的细节掩盖了程序的逻辑。考虑一个计算前几个正整数立方和的函数n。在 C 语言中,这样的函数可能类似于以下内容:
Functional languages have a potential advantage in readability. In many imperative programs, the details of dealing with variables obscure the logic of the program. Consider a function that computes the sum of the cubes of the first n positive integers. In C, such a function would likely appear similar to the following:
int sum_cubes(int n){
int sum = 0;
for(int index = 1; index <= n; index++)
sum += index * index * index;
return sum;
}
int sum_cubes(int n){
int sum = 0;
for(int index = 1; index <= n; index++)
sum += index * index * index;
return sum;
}
在 Haskell 中,该函数可以是:
In Haskell, the function could be:
sumCubes n = sum (map (^3) [1..n])sumCubes n = sum (map (^3) [1..n])
此版本简单地指定了三个步骤:
This version simply specifies three steps:
建立数字列表([1..n])。
Build the list of numbers ([1..n]).
通过将计算数字立方的函数映射到列表中的每个数字上来创建新列表。
Create a new list by mapping a function that computes the cube of a number onto each number in the list.
对新列表求和。
Sum the new list.
由于缺少变量和迭代控制的细节,该版本比 C 版本更具可读性。15
Because of the lack of details of variables and iteration control, this version is more readable than the C version.15
正如我们在第13章 中看到的那样,命令式语言中的并发执行很难设计和使用。在命令式语言中,程序员必须将程序静态地划分为并发部分,然后将其写为任务,而这些任务的执行通常必须同步。这可能是一个复杂的过程。函数式语言中的程序自然地被划分为函数。在纯函数式语言中,这些函数是独立的,因为它们不会产生副作用,并且它们的操作不依赖于任何非局部或全局变量。因此,确定哪些函数可以并发执行要容易得多。调用中的实际参数表达式通常可以并发求值。只需指定可以做到这一点,就可以在单独的线程中隐式地求值函数,就像在 Multilisp 中一样。当然,访问共享的不可变数据不需要同步。
Concurrent execution in the imperative languages is difficult to design and difficult to use, as we saw in Chapter 13. In an imperative language, the programmer must make a static division of the program into its concurrent parts, which are then written as tasks, whose execution often must be synchronized. This can be a complicated process. Programs in functional languages are naturally divided into functions. In a pure functional language, these functions are independent in the sense that they do not create side effects and their operations do not depend on any nonlocal or global variables. Therefore, it is much easier to determine which of them can be concurrently executed. The actual parameter expressions in calls often can be evaluated concurrently. Simply by specifying that it can be done, a function can be implicitly evaluated in a separate thread, as in Multilisp. And, of course, access to shared immutable data does not require synchronization.
一个简单因素会极大地影响命令式或过程式编程的复杂性,那就是程序员在开发的每个步骤中都必须关注程序的状态。在大型程序中,程序的状态是大量值(对于大量程序变量而言)。在纯函数式编程中,没有状态;因此,无需花费精力记住它。
One simple factor that strongly affects the complexity of imperative, or procedural programming, is the necessary attention of the programmer to the state of the program at each step of its development. In a large program, the state of the program is a large number of values (for the large number of program variables). In pure functional programming, there is no state; hence, no need to devote attention to keeping it in mind.
确定函数式语言为何未能获得更大的普及并非易事。早期实现的低效率显然是当时的一个因素,而且至少一些当代命令式程序员可能仍然认为用函数式语言编写的程序很慢。此外,绝大多数程序员都是使用命令式语言学习编程的,这使得函数式程序在他们看来很奇怪,很难理解。对于许多习惯于命令式编程的人来说,转向函数式编程是一个没有吸引力且可能很困难的举动。另一方面,那些从函数式语言开始的人从未注意到函数式程序有什么奇怪之处。
It is not a simple matter to determine precisely why functional languages have not attained greater popularity. The inefficiency of the early implementations was clearly a factor then, and it is likely that at least some contemporary imperative programmers still believe that programs written in functional languages are slow. In addition, the vast majority of programmers learn programming using imperative languages, which makes functional programs appear to them to be strange and difficult to understand. For many who are comfortable with imperative programming, the switch to functional programming is an unattractive and potentially difficult move. On the other hand, those who begin with a functional language never notice anything strange about functional programs.
数学函数是命名或未命名的映射,仅使用条件表达式和递归来控制其求值。复杂函数可以使用高阶函数或函数形式来定义,其中函数用作参数、返回值或两者。
Mathematical functions are named or unnamed mappings that use only conditional expressions and recursion to control their evaluations. Complex functions can be defined using higher-order functions or functional forms, in which functions are used as parameters, returned values, or both.
函数式编程语言以数学函数为模型。在纯粹的形式中,它们不使用变量或赋值语句来产生结果;相反,它们使用函数应用、条件表达式和递归来控制执行,并使用函数形式来构造复杂函数。Lisp 最初是一种纯粹的函数式语言,但很快就添加了许多命令式语言特性,以提高其效率和易用性。
Functional programming languages are modeled on mathematical functions. In their pure form, they do not use variables or assignment statements to produce results; rather, they use function applications, conditional expressions, and recursion for execution control and functional forms to construct complex functions. Lisp began as a purely functional language but soon acquired a number of imperative-language features added in order to increase its efficiency and ease of use.
Lisp 的第一个版本源于人工智能应用对列表处理语言的需求。Lisp 仍然是该应用领域使用最广泛的语言。
The first version of Lisp grew out of the need for a list-processing language for AI applications. Lisp is still the most widely used language for that application area.
Lisp 的首次实现纯属偶然:EVAL开发的原始版本仅仅是为了证明可以编写通用的 Lisp 函数。
The first implementation of Lisp was serendipitous: The original version of EVAL was developed solely to demonstrate that a universal Lisp function could be written.
由于 Lisp 数据和 Lisp 程序具有相同的形式,因此可以用一个程序构建另一个程序。 的可用性EVAL允许动态构建的程序立即执行。
Because Lisp data and Lisp programs have the same form, it is possible to have a program build another program. The availability of EVAL allows dynamically constructed programs to be executed immediately.
Scheme 是 Lisp 的一个相对简单的方言,它只使用静态作用域。与 Lisp 一样,Scheme 的主要原语包括用于构造和分解列表的函数、用于条件表达式的函数以及用于数字、符号和列表的简单谓词。
Scheme is a relatively simple dialect of Lisp that uses static scoping exclusively. Like Lisp, Scheme’s primary primitives include functions for constructing and dismantling lists, functions for conditional expressions, and simple predicates for numbers, symbols, and lists.
Common Lisp 是一种基于 Lisp 的语言,旨在包含 20 世纪 80 年代早期 Lisp 方言的大多数功能。它允许使用静态和动态范围变量,并包含许多命令式功能。Common Lisp 使用宏来定义其部分函数。用户可以定义自己的宏。该语言包括读取器宏,这些宏也是用户可定义的。读取器宏定义单符号宏。
Common Lisp is a Lisp-based language that was designed to include most of the features of the Lisp dialects of the early 1980s. It allows both static- and dynamic-scoped variables and includes many imperative features. Common Lisp uses macros to define some of its functions. Users are allowed to define their own macros. The language includes reader macros, which are also user definable. Reader macros define single-symbol macros.
ML 是一种静态作用域和强类型的函数式编程语言,其语法与命令式语言的语法更接近,而不是 Lisp。它包括类型推断系统、异常处理、各种数据结构和抽象数据类型。
ML is a static-scoped and strongly typed functional programming language that uses a syntax that is more closely related to that of an imperative language than to Lisp. It includes a type-inferencing system, exception handling, a variety of data structures, and abstract data types.
ML 不进行任何类型强制,也不允许函数重载。可以使用实际参数形式的模式匹配来定义多个函数定义。柯里化是将接受多个参数的函数替换为接受单个参数并返回接受其他参数的函数的过程。ML 以及其他几种函数式语言都支持柯里化。
ML does not do any type coercions and does not allow function overloading. Multiple definitions of functions can be defined using pattern matching of the actual parameter form. Currying is the process of replacing a function that takes multiple parameters with one that takes a single parameter and returns a function that takes the other parameters. ML, as well as several other functional languages, supports currying.
Haskell 与 ML 类似,不同之处在于 Haskell 中的所有表达式都使用惰性方法求值,这允许程序处理无限列表。Haskell 还支持列表推导,这为描述集合提供了一种方便且熟悉的语法。与 ML 和 Scheme 不同,Haskell 是一种纯函数式语言。
Haskell is similar to ML, except that all expressions in Haskell are evaluated using a lazy method, which allows programs to deal with infinite lists. Haskell also supports list comprehensions, which provide a convenient and familiar syntax for describing sets. Unlike ML and Scheme, Haskell is a pure functional language.
F# 是一种 .NET 编程语言,支持函数式和命令式编程,包括面向对象编程。其函数式编程核心基于 OCaml(ML 和 Haskell 的后代)。F# 由精心设计且广泛使用的 IDE 支持。它还可以与其他 .NET 语言互操作,并可以访问 .NET 类库。
F# is a .NET programming language that supports functional and imperative programming, including object-oriented programming. Its functional programming core is based on OCaml, a descendent of ML and Haskell. F# is supported by an elaborate and widely used IDE. It also interoperates with other .NET languages and has access to the .NET class library.
Lisp 的第一个发布版本可以在McCarthy (1960)中找到。McCarthy等人 (1965)和Weissman (1967)描述了从 20 世纪 60 年代中期到 70 年代末广泛使用的版本。Steele (1990)描述了 Common Lisp。Dybvig (2011)描述了 Scheme 语言。Milner等人 (1997)定义了 ML。Ullman ( 1998)是一本很棒的 ML 入门教科书。Thompson (1999)介绍了 Haskell 编程。Syme等人 (2010)描述了 F# 。
The first published version of Lisp can be found in McCarthy (1960). A widely used version from the mid-1960s until the late 1970s is described in McCarthy et al. (1965) and Weissman (1967). Common Lisp is described in Steele (1990). The Scheme language is described in Dybvig (2011). ML is defined in Milner et al. (1997). Ullman (1998) is an excellent introductory textbook for ML. Programming in Haskell is introduced in Thompson (1999). F# is described in Syme et al. (2010).
本章中的 Scheme 程序是使用 DrRacket 的遗留语言 R5RS 开发的。
The Scheme programs in this chapter were developed using DrRacket’s legacy language R5RS.
关于函数式编程的一般严格讨论可以在Henderson (1980)中找到。Peyton Jones (1987)详细讨论了通过图形缩减实现函数式语言的过程。
A rigorous discussion of functional programming in general can be found in Henderson (1980). The process of implementing functional languages through graph reduction is discussed in detail in Peyton Jones (1987).
定义函数形式、简单列表、绑定变量和引用透明性。
Define functional form, simple list, bound variable, and referential transparency.
Lambda 表达式指定什么?
What does a lambda expression specify?
原始 Lisp 包含哪些数据类型?
What data types were parts of the original Lisp?
Lisp 列表通常存储在什么常见数据结构中?
In what common data structure are Lisp lists normally stored?
解释为什么QUOTE需要一个数据列表参数。
Explain why QUOTE is needed for a parameter that is a data list.
什么是简单列表?
What is a simple list?
REPL 这个缩写代表什么?
What does the abbreviation REPL stand for?
这三个参数是什么IF?
What are the three parameters to IF?
=、、、EQ?和EQV?之间有何区别EQUAL?
What are the differences between =, EQ?, EQV?, and EQUAL?
DEFINEScheme 特殊形式使用的评估方法与其原始函数使用的评估方法有何不同?
What are the differences between the evaluation method used for the Scheme special form DEFINE and that used for its primitive functions?
的两种形式是什么DEFINE?
What are the two forms of DEFINE?
描述 的语法和语义COND。
Describe the syntax and semantics of COND.
为何CAR如此CDR命名?
Why are CAR and CDR so named?
如果CONS用两个原子调用,比如'A和'B,返回的是什么?
If CONS is called with two atoms, say 'A and 'B, what is the returned?
LET描述Scheme的语法和语义。
Describe the syntax and semantics of LET in Scheme.
CONS、LIST和之间有何区别APPEND?
What are the differences between CONS, LIST, and APPEND?
mapcar描述Scheme的语法和语义。
Describe the syntax and semantics of mapcar in Scheme.
什么是尾递归?为什么将使用递归指定重复的函数定义为尾递归很重要?
What is tail recursion? Why is it important to define functions that use recursion to specify repetition to be tail recursive?
为什么大多数 Lisp 方言都添加了命令式特性?
Why were imperative features added to most dialects of Lisp?
Common Lisp 和 Scheme 有哪些不同之处?
In what ways are Common Lisp and Scheme opposites?
Scheme 中使用了什么作用域规则?Common Lisp 中呢?ML 中呢?Haskell 中呢?F# 中呢?
What scoping rule is used in Scheme? In Common Lisp? In ML? In Haskell? In F#?
在 Common Lisp 语言处理器的读取器阶段会发生什么?
What happens during the reader phase of a Common Lisp language processor?
ML 与 Scheme 有哪两个根本区别?
What are two ways that ML is fundamentally different from Scheme?
ML 评估环境中存储了什么?
What is stored in an ML evaluation environment?
valC 语言中的 ML 语句和赋值语句有什么区别?
What is the difference between an ML val statement and an assignment statement in C?
机器学习中使用的类型推断是什么?
What is type inferencing, as used in ML?
fnML中的保留字有什么用途?
What is the use of the fn reserved word in ML?
处理标量数字的 ML 函数可以通用吗?
Can ML functions that deal with scalar numerics be generic?
什么是柯里化函数?
What is a curried function?
部分评价是什么意思?
What does partial evaluation mean?
描述 ML 函数的动作filter。
Describe the actions of the ML filter function.
ML 对 Scheme 的 CAR 使用什么运算符?
What operator does ML use for Scheme’s CAR?
ML 使用什么运算符进行函数组合?
What operator does ML use for functional composition?
Haskell 有哪三个特点使其不同于 ML?
What are the three characteristics of Haskell that make it different from ML?
惰性求值是什么意思?
What does lazy evaluation mean?
什么是严格的编程语言?
What is a strict programming language?
F# 支持哪些编程范例?
What programming paradigms are supported by F#?
F# 可以与哪些其他编程语言互操作?
With what other programming languages can F# interoperate?
F# 起什么作用let?
What does F#’s let do?
F# 构造的范围如何let终止?
How is the scope of a F# let construct terminated?
F# 中序列和列表之间的根本区别是什么?
What is the underlying difference between a sequence and a list in F#?
就程度而言,ML 的 let 和 F# 的 let 有何区别?
What is the difference between the let of ML and that of F#, in terms of extent?
F# 中 lambda 表达式的语法是什么?
What is the syntax of a lambda expression in F#?
F# 是否强制表达式中的数值? 争论以支持设计选择。
Does F# coerce numeric values in expressions? Argue in support of the design choice.
Python 为函数式编程提供了什么支持?
What support does Python provide for functional programming?
Ruby 中的哪个函数用于创建柯里化函数?
What function in Ruby is used to create a curried function?
函数式编程的使用范围在扩大还是在缩小?
Is the use of functional programming expanding or shrinking?
函数式编程语言的哪个特点使得它们的语义比命令式语言更简单?
What is one characteristic of functional programming languages that makes their semantics simpler than that of imperative languages?
使用代码行数来比较函数式语言和命令式语言的生产力有什么缺陷?
What is the flaw in using lines of code to compare the productivity of functional languages and that of imperative languages?
为什么函数式语言比命令式语言更容易实现并发?
Why can concurrency be easier with functional languages than imperative languages?
阅读 John Backus 关于 FP 的论文 (Backus, 1978),并将本章讨论的 Scheme 的特性与 FP 的相应特性进行比较。
Read John Backus’s paper on FP (Backus, 1978) and compare the features of Scheme discussed in this chapter with the corresponding features of FP.
找到 Scheme 函数EVAL和的定义APPLY,并解释其作用。
Find definitions of the Scheme functions EVAL and APPLY, and explain their actions.
最现代、最完整的编程环境之一是 Lisp 的 INTERLISP 系统,如 Teitelmen 和 Masinter 所著的《INTERLISP 编程环境》(IEEE 计算机,第 14 卷,第 4 期,1981 年 4 月)中所述。请仔细阅读本文,并将在您的系统上编写 Lisp 程序的难度与使用 INTERLISP 的难度进行比较(假设您通常不使用 INTERLISP)。
One of the most modern and complete programming environments is the INTERLISP system for Lisp, as described in “The INTERLISP Programming Environment,” by Teitelmen and Masinter (IEEE Computer, Vol. 14, No. 4, April 1981). Read this article carefully and compare the difficulty of writing Lisp programs on your system with that of using INTERLISP (assuming that you do not normally use INTERLISP).
参考有关 Lisp 编程的书籍,并确定哪些参数支持在 Lisp 中包含该PROG功能。
Refer to a book on Lisp programming and determine what arguments support the inclusion of the PROG feature in Lisp.
找到至少一个使用类型化函数编程语言构建以下每个领域的商业系统的示例:数据库处理、金融建模、统计分析和生物信息学。
Find at least one example of a typed functional programming language being used to build a commercial system in each of the following areas: database processing, financial modeling, statistical analysis, and bioinformatics.
函数式语言可以使用列表以外的其他数据结构。例如,它可以使用单字符符号的字符串。这种语言将使用哪些原语来代替Scheme 的CAR、CDR和CONS原语?
A functional language could use some data structure other than the list. For example, it could use strings of single-character symbols. What primitives would such a language have in place of the CAR, CDR, and CONS primitives of Scheme?
列出 F# 中不包含在 ML 中的功能。
Make a list of the features of F# that are not in ML.
如果 Scheme 是一种纯函数式语言,那么它可以包含吗DISPLAY?为什么或为什么不?
If Scheme were a pure functional language, could it include DISPLAY? Why or why not?
以下 Scheme 函数的作用是什么?
What does the following Scheme function do?
(define (y s lis)
(cond
((null? lis) '() )
((equal? s (car lis)) lis)
(else (y s (cdr lis)))
))
(define (y s lis)
(cond
((null? lis) '() )
((equal? s (car lis)) lis)
(else (y s (cdr lis)))
))
以下 Scheme 函数的作用是什么?
What does the following Scheme function do?
(define (x lis)
(cond
((null? lis) 0)
((not (list? (car lis)))
(cond
((eq? (car lis) #f) (x (cdr lis)))
(else (+ 1 (x (cdr lis))))))
(else (+ (x (car lis)) (x (cdr lis))))
(define (x lis)
(cond
((null? lis) 0)
((not (list? (car lis)))
(cond
((eq? (car lis) #f) (x (cdr lis)))
(else (+ 1 (x (cdr lis))))))
(else (+ (x (car lis)) (x (cdr lis))))
编写一个 Scheme 函数,根据球体的半径计算球体的体积。
Write a Scheme function that computes the volume of a sphere, given its radius.
编写一个 Scheme 函数,计算给定二次方程的实根。如果根是复数,则该函数必须显示一条消息来表明这一点。该函数必须使用一个IF函数。该函数的三个参数是二次方程的三个系数。
Write a Scheme function that computes the real roots of a given quadratic equation. If the roots are complex, the function must display a message indicating that. This function must use an IF function. The three parameters to the function are the three coefficients of the quadratic equation.
COND使用函数而不是函数重复编程练习 2 IF。
Repeat Programming Exercise 2 using a COND function, rather than an IF function.
编写一个 Scheme 函数,它接受两个数字参数A和B,并返回A其幂B。
Write a Scheme function that takes two numeric parameters, A and B, and returns A raised to the B power.
编写一个 Scheme 函数,返回给定的简单数字列表中零的数量。
Write a Scheme function that returns the number of zeros in a given simple list of numbers.
编写一个 Scheme 函数,以一个简单的数字列表作为参数,并返回输入列表中最大和最小数字的列表。
Write a Scheme function that takes a simple list of numbers as a parameter and returns a list with the largest and smallest numbers in the input list.
编写一个 Scheme 函数,以列表和原子作为参数,并返回与其参数列表相同的列表,但删除了给定原子的所有顶级实例。
Write a Scheme function that takes a list and an atom as parameters and returns a list identical to its parameter list except with all top-level instances of the given atom deleted.
编写一个 Scheme 函数,以列表作为参数,并返回与参数相同的列表,只是最后一个元素已被删除。
Write a Scheme function that takes a list as a parameter and returns a list identical to the parameter except the last element has been deleted.
重复编程练习 7,不同之处在于原子可以是原子也可以是列表。
Repeat Programming Exercise 7, except that the atom can be either an atom or a list.
编写一个 Scheme 函数,该函数以两个原子和一个列表作为参数,并返回一个与参数列表相同的列表,只是列表中出现的所有第一个给定原子都被替换为第二个给定的原子,无论第一个原子嵌套的深度有多深。
Write a Scheme function that takes two atoms and a list as parameters and returns a list identical to the parameter list except all occurrences of the first given atom in the list are replaced with the second given atom, no matter how deeply the first atom is nested.
编写一个 Scheme 函数,返回其简单列表参数的反转。
Write a Scheme function that returns the reverse of its simple list parameter.
编写一个 Scheme 谓词函数,测试两个给定列表的结构相等性。如果两个列表具有相同的列表结构,则它们在结构上相等,尽管它们的原子可能不同。
Write a Scheme predicate function that tests for the structural equality of two given lists. Two lists are structurally equal if they have the same list structure, although their atoms may be different.
编写一个 Scheme 函数,返回代表集合的两个简单列表参数的并集。
Write a Scheme function that returns the union of two simple list parameters that represent sets.
编写一个带有两个参数(一个原子和一个列表)的 Scheme 函数,返回一个与参数列表相同的列表,但删除了给定原子的所有出现位置(无论多深)。返回的列表不能包含任何替代已删除原子的内容。
Write a Scheme function with two parameters, an atom and a list, that returns a list identical to the parameter list except with all occurrences, no matter how deep, of the given atom deleted. The returned list cannot contain anything in place of the deleted atoms.
编写一个 Scheme 函数,以列表为参数并返回一个与参数列表相同的列表,但删除了第二个顶级元素。如果给定的列表没有两个元素,则该函数应返回()。
Write a Scheme function that takes a list as a parameter and returns a list identical to the parameter list except with the second top-level element removed. If the given list does not have two elements, the function should return ().
编写一个 Scheme 函数,以简单的数字列表作为参数,并返回与参数列表相同的列表,但数字按升序排列。
Write a Scheme function that takes a simple list of numbers as its parameter and returns a list identical to the parameter list except with the numbers in ascending order.
编写一个 Scheme 函数,以一个简单的数字列表作为参数,并返回列表中最大和最小的数字。
Write a Scheme function that takes a simple list of numbers as its parameter and returns the largest and smallest numbers in the list.
编写一个 Scheme 函数,以简单列表作为参数,并返回给定列表的所有排列的列表。
Write a Scheme function that takes a simple list as its parameter and returns a list of all permutations of the given list.
用 Scheme 编写快速排序算法。
Write the quicksort algorithm in Scheme.
将以下 Scheme 函数重写为尾递归函数:
Rewrite the following Scheme function as a tail-recursive function:
(DEFINE (doit n)
(IF (= n 0)
0
(+ n (doit (- n 1)))
))
(DEFINE (doit n)
(IF (= n 0)
0
(+ n (doit (- n 1)))
))
用 F# 编写前 19 个编程练习中的任意一个。
Write any of the first 19 Programming Exercises in F#.
编写 ML 中的前 19 个编程练习中的任意一个。
Write any of the first 19 Programming Exercises in ML.
本章的目的是介绍逻辑编程和逻辑编程语言的概念,包括对 Prolog 子集的简要描述。我们首先介绍谓词演算,它是逻辑编程语言的基础。然后讨论如何将谓词演算用于自动定理证明系统。然后,我们概述逻辑编程。接下来,一个较长的部分介绍了 Prolog 编程语言的基础知识,包括算术、列表处理和跟踪工具,可用于帮助调试程序并说明 Prolog 系统的工作原理。最后两节描述了 Prolog 作为逻辑语言的一些问题以及 Prolog 的一些应用领域。
The objectives of this chapter are to introduce the concepts of logic programming and logic programming languages, including a brief description of a subset of Prolog. We begin with an introduction to predicate calculus, which is the basis for logic programming languages. This is followed by a discussion of how predicate calculus can be used for automatic theorem-proving systems. Then, we present a general overview of logic programming. Next, a lengthy section introduces the basics of the Prolog programming language, including arithmetic, list processing, and a trace tool that can be used to help debug programs and also to illustrate how the Prolog system works. The final two sections describe some of the problems of Prolog as a logic language and some of the application areas in which Prolog has been used.
第15章讨论了函数式编程范式,它与命令式语言所使用的软件开发方法有很大不同。在本章中,我们将描述另一种不同的编程方法。在这种情况下,方法是用符号逻辑的形式来表达程序,并使用逻辑推理过程来产生结果。逻辑程序是声明性的,而不是程序性的,这意味着只陈述了所需结果的规范,而不是产生这些结果的详细过程。逻辑编程语言中的程序是事实和规则的集合。这样的程序通过向它提出问题来使用,它试图通过查阅事实和规则来回答这些问题。“查阅”在这里可能具有误导性,因为这个过程远比这个词所暗示的要复杂得多。这种解决问题的方法听起来似乎只解决一类非常狭窄的问题,但它比人们想象的要灵活得多。
Chapter 15 discusses the functional programming paradigm, which is significantly different from the software development methodologies used with the imperative languages. In this chapter, we describe another different programming methodology. In this case, the approach is to express programs in a form of symbolic logic and use a logical inferencing process to produce results. Logic programs are declarative rather than procedural, which means that only the specifications of the desired results are stated rather than detailed procedures for producing them. Programs in logic programming languages are collections of facts and rules. Such a program is used by asking it questions, which it attempts to answer by consulting the facts and rules. “Consulting” is perhaps misleading here, for the process is far more complex than that word connotes. This approach to problem solving may sound like it addresses only a very narrow category of problems, but it is more flexible than might be thought.
使用符号逻辑形式作为编程语言的编程通常称为逻辑编程,而基于符号逻辑的语言称为逻辑编程语言或声明性语言。我们选择描述逻辑编程语言 Prolog,因为它是唯一广泛使用的逻辑语言。
Programming that uses a form of symbolic logic as a programming language is often called logic programming, and languages based on symbolic logic are called logic programming languages, or declarative languages. We have chosen to describe the logic programming language Prolog, because it is the only widely used logic language.
逻辑编程语言的语法与命令式和函数式语言的语法有显著不同。逻辑程序的语义也与命令式语言程序的语义大不相同。这些观察应该会让读者对逻辑编程和声明式语言的本质产生好奇。
The syntax of logic programming languages is remarkably different from that of the imperative and functional languages. The semantics of logic programs also bears little resemblance to that of imperative-language programs. These observations should lead the reader to some curiosity about the nature of logic programming and declarative languages.
在讨论逻辑编程之前,我们必须简要研究一下它的基础,即形式逻辑。这不是我们在本文中第一次接触形式逻辑;它在第3章 描述的公理语义中被广泛使用。
Before we can discuss logic programming, we must briefly investigate its basis, which is formal logic. This is not our first contact with formal logic in this text; it was used extensively in the axiomatic semantics described in Chapter 3.
命题可以被认为是可能为真也可能为假的逻辑陈述。它由对象和对象之间的关系组成。形式逻辑的开发是为了提供一种描述命题的方法,目的是允许检查这些正式陈述的命题的有效性。
A proposition can be thought of as a logical statement that may or may not be true. It consists of objects and the relationships among objects. Formal logic was developed to provide a method for describing propositions, with the goal of allowing those formally stated propositions to be checked for validity.
符号逻辑可以用于形式逻辑的三个基本需求:表达命题、表达命题之间的关系、描述如何从假定为真的其他命题推断出新命题。
Symbolic logic can be used for the three basic needs of formal logic: to express propositions, to express the relationships between propositions, and to describe how new propositions can be inferred from other propositions that are assumed to be true.
形式逻辑与数学之间有着密切的关系。事实上,很多数学都可以用逻辑来思考。数论和集合论的基本公理是一组初始命题,这些命题被假定为真。定理是可以从初始命题集合中推断出的附加命题。
There is a close relationship between formal logic and mathematics. In fact, much of mathematics can be thought of in terms of logic. The fundamental axioms of number and set theory are the initial set of propositions, which are assumed to be true. Theorems are the additional propositions that can be inferred from the initial set.
用于逻辑编程的特定形式的符号逻辑称为一阶谓词演算(虽然有点不精确,但我们通常将其称为谓词演算)。在以下小节中,我们将简要介绍谓词演算。我们的目标是为讨论逻辑编程和逻辑编程语言 Prolog 奠定基础。
The particular form of symbolic logic that is used for logic programming is called first-order predicate calculus (though it is a bit imprecise, we will usually refer to it as predicate calculus). In the following subsections, we present a brief look at predicate calculus. Our goal is to lay the groundwork for a discussion of logic programming and the logic programming language Prolog.
逻辑编程命题中的对象用简单的术语表示,这些术语要么是常量,要么是变量。常量是代表对象的符号。变量是可以在不同时间代表不同对象的符号,尽管在某种意义上,它比命令式编程语言中的变量更接近数学。
The objects in logic programming propositions are represented by simple terms, which are either constants or variables. A constant is a symbol that represents an object. A variable is a symbol that can represent different objects at different times, although in a sense that is far closer to mathematics than the variables in an imperative programming language.
最简单的命题称为原子命题,由复合项组成。复合项是数学关系的一个元素,以数学函数符号的形式书写。回想一下第15章 ,数学函数是一种映射,可以表示为表达式或元组表或列表。复合项是函数表格定义的元素。
The simplest propositions, which are called atomic propositions, consist of compound terms. A compound term is one element of a mathematical relation, written in a form that has the appearance of mathematical function notation. Recall from Chapter 15 that a mathematical function is a mapping, which can be represented either as an expression or as a table or list of tuples. Compound terms are elements of the tabular definition of a function.
复合词由两部分组成:函子(即命名关系的函数符号)和有序的参数列表,它们共同表示关系的一个元素。具有单个参数的复合词是 1 元组;具有两个参数的复合词是 2 元组,依此类推。例如,我们可能有两个命题
A compound term is composed of two parts: a functor, which is the function symbol that names the relation, and an ordered list of parameters, which together represent an element of the relation. A compound term with a single parameter is a 1-tuple; one with two parameters is a 2-tuple, and so forth. For example, we might have the two propositions
男人(杰克)
喜欢(鲍勃,牛排)
man(jake)
like(bob, steak)
这表明 {jake} 在关系 man 中是 1 元组,而 {bob, steak} 在关系 like 中是 2 元组。如果我们添加命题
which state that {jake} is a 1-tuple in the relation named man, and that {bob, steak} is a 2-tuple in the relation named like. If we added the proposition
男人(弗雷德)
man(fred)
和前两个命题相比,关系 man 会有两个不同的元素,{jake} 和 {fred}。这些命题中的所有简单术语(man、jake、like、bob 和 steak)都是常量。请注意,这些命题没有内在语义。它们的意思是我们想要的。例如,第二个例子可能意味着 bob 喜欢牛排,或者牛排喜欢 bob,或者 bob 在某种程度上类似于牛排。
to the two previous propositions, then the relation man would have two distinct elements, {jake} and {fred}. All of the simple terms in these propositions—man, jake, like, bob, and steak—are constants. Note that these propositions have no intrinsic semantics. They mean whatever we want them to mean. For example, the second example may mean that bob likes steak, or that steak likes bob, or that bob is in some way similar to a steak.
命题可以用两种方式表述:一种是命题被定义为真,另一种是命题的真实性需要确定。换句话说,命题可以是事实,也可以是疑问。上面的示例命题可以是事实或疑问。
Propositions can be stated in two modes: one in which the proposition is defined to be true, and one in which the truth of the proposition is something that is to be determined. In other words, propositions either can be facts or queries. The example propositions above could be either.
复合命题有两个或多个原子命题,它们通过逻辑连接符或运算符连接,就像命令式语言中复合逻辑表达式的构造方式一样。谓词演算逻辑连接符的名称、符号和含义如下:
Compound propositions have two or more atomic propositions, which are connected by logical connectors, or operators, in the same way compound logic expressions are constructed in imperative languages. The names, symbols, and meanings of the predicate calculus logical connectors are as follows:
以下是复合命题的例子:
The following are examples of compound propositions:
⫎ 运算符具有最高优先级。运算符 和 都具有比 和 因此,上面的第二个例子相当于
The ⫎ operator has the highest precedence. The operators and all have higher precedence than and So, the second example above is equivalent to
变量可以出现在命题中,但必须由称为量词的特殊符号引入。谓词演算包括两个量词,如下所述,其中X是变量,P是命题:
Variables can appear in propositions but only when introduced by special symbols called quantifiers. Predicate calculus includes two quantifiers, as described below, where X is a variable and P is a proposition:
X和P之间的句号只是将变量与命题分开。例如,考虑以下内容:
The period between X and P simply separates the variable from the proposition. For example, consider the following:
(女性(X) 人类(X))
(woman (X) human (X))
(母亲(玛丽,X) 男性 ( X ))
(mother (mary, X) male (X))
第一个命题意味着,对于任何X值,如果X是女性,则X是人类。第二个命题意味着存在一个X值,使得玛丽是X的母亲,并且X是男性;换句话说,玛丽有一个儿子。全称量词和存在量词的范围是它们所附加的原子命题。可以使用括号扩展此范围,就像刚才描述的两个复合命题一样。因此,全称量词和存在量词的优先级高于任何运算符。
The first of these propositions means that for any value of X, if X is a woman, then X is a human. The second means that there exists a value of X such that mary is the mother of X and X is a male; in other words, mary has a son. The scope of the universal and existential quantifiers is the atomic propositions to which they are attached. This scope can be extended using parentheses, as in the two compound propositions just described. So, the universal and existential quantifiers have higher precedence than any of the operators.
我们之所以讨论谓词演算,是因为它是逻辑编程语言的基础。与其他语言一样,逻辑语言以最简单的形式呈现效果最佳,这意味着应尽量减少冗余。
We are discussing predicate calculus because it is the basis for logic programming languages. As with other languages, logic languages are best in their simplest form, meaning that redundancy should be minimized.
到目前为止,我们所描述的谓词演算的一个问题是,有太多不同的方式可以陈述具有相同含义的命题;也就是说,存在大量冗余。对于逻辑学家来说,这不是什么问题,但如果要在自动化(计算机化)系统中使用谓词演算,那么这是一个严重的问题。为了简化问题,需要一种标准的命题形式。子句形式是一种相对简单的命题形式,就是这样一种标准形式。所有命题都可以用子句形式来表达。子句形式的命题具有以下一般语法:
One problem with predicate calculus as we have described it thus far is that there are too many different ways of stating propositions that have the same meaning; that is, there is a great deal of redundancy. This is not such a problem for logicians, but if predicate calculus is to be used in an automated (computerized) system, it is a serious problem. To simplify matters, a standard form for propositions is desirable. Clausal form, which is a relatively simple form of propositions, is one such standard form. All propositions can be expressed in clausal form. A proposition in clausal form has the following general syntax:
其中A和B都是项。这种子句形式命题的含义如下:如果所有 A都为真,则至少有一个B为真。子句形式命题的主要特征如下:不需要存在量词;全称量词隐含在原子命题的变量使用中;不需要除合取和析取之外的其他运算符。而且,合取和析取只需按照一般子句形式中所示的顺序出现:析取在左侧,合取在右侧。所有谓词演算命题都可以通过算法转换为子句形式。Nilsson (1971)证明了这一点,并给出了一种简单的转换算法。
in which the A’s and B’s are terms. The meaning of this clausal form proposition is as follows: If all of the A’s are true, then at least one B is true. The primary characteristics of clausal form propositions are the following: Existential quantifiers are not required; universal quantifiers are implicit in the use of variables in the atomic propositions; and no operators other than conjunction and disjunction are required. Also, conjunction and disjunction need appear only in the order shown in the general clausal form: disjunction on the left side and conjunction on the right side. All predicate calculus propositions can be algorithmically converted to clausal form. Nilsson (1971) gives proof that this can be done, as well as a simple conversion algorithm for doing it.
小句形式命题的右侧称为前件。左侧称为后件,因为它是前件为真所产生的结果。作为小句形式命题的例子,请考虑以下内容:
The right side of a clausal form proposition is called the antecedent. The left side is called the consequent because it is the consequence of the truth of the antecedent. As examples of clausal form propositions, consider the following:
喜欢(鲍勃、鳟鱼) 喜欢(鲍勃,鱼) 鱼(鳟鱼)
likes (bob, trout) likes (bob, fish) fish (trout)
(路易斯神父)
路易斯神父(,紫罗兰)
father (louis, al)
father (louis, violet)
(鲍勃神父)
紫罗兰妈妈(,鲍勃)
祖父(路易斯·鲍勃)
father (al, bob)
mother (violet, bob)
grandfather (louis, bob)
第一个句子的英文版本是这样的:如果鲍勃喜欢鱼,而鳟鱼是鱼,那么鲍勃就喜欢鳟鱼。第二个句子是这样的:如果艾尔是鲍勃的父亲,维奥莱特是鲍勃的母亲,路易斯是鲍勃的祖父,那么路易斯要么是艾尔的父亲,要么是维奥莱特的父亲。
The English version of the first of these states that if bob likes fish and a trout is a fish, then bob likes trout. The second states that if al is bob’s father and violet is bob’s mother and louis is bob’s grandfather, then louis is either al’s father or violet’s father.
谓词演算提供了一种表达命题集合的方法。命题集合的一个用途是确定是否可以从中推断出任何有趣或有用的事实。这与数学家的工作完全类似,他们努力发现可以从已知公理和定理中推断出的新定理。
Predicate calculus provides a method of expressing collections of propositions. One use of collections of propositions is to determine whether any interesting or useful facts can be inferred from them. This is exactly analogous to the work of mathematicians, who strive to discover new theorems that can be inferred from known axioms and theorems.
在计算机科学发展的早期(20 世纪 50 年代和 60 年代初),人们对定理证明过程的自动化产生了浓厚的兴趣。自动定理证明领域最重大的突破之一是锡拉丘兹大学的艾伦·罗宾逊 (1965)发现的归结原理。
The early days of computer science (the 1950s and early 1960s) saw a great deal of interest in automating the theorem-proving process. One of the most significant breakthroughs in automatic theorem proving was the discovery of the resolution principle by Alan Robinson (1965) at Syracuse University.
归结是一种推理规则,允许从给定命题计算出推断命题,从而提供一种可能应用于自动定理证明的方法。归结被设计用于子句形式的命题。归结的概念如下:假设有两个命题,形式如下
Resolution is an inference rule that allows inferred propositions to be computed from given propositions, thus providing a method with potential application to automatic theorem proving. Resolution was devised to be applied to propositions in clausal form. The concept of resolution is the following: Suppose there are two propositions with the forms
它们的意思是 暗示 和 暗示 此外,假设 等同于 这样我们就可以重命名 和 作为T。然后,我们可以将这两个命题重写为
Their meaning is that implies and implies Furthermore, suppose that is identical to so that we could rename and as T. Then, we could rewrite the two propositions as
现在,因为 意味着T并且T意味着 从逻辑上显然 暗示 我们可以将其写成
Now, because implies T and T implies it is logically obvious that implies which we could write as
由原来的两个命题推出这个命题的过程就是解析。
The process of inferring this proposition from the original two propositions is resolution.
再举一个例子,考虑以下两个命题:
As another example, consider the two propositions:
年长的(joanne, jake) 母亲(乔安妮,杰克)
怀瑟(乔安妮,杰克)
年长的(joanne, jake)
older (joanne, jake) mother (joanne, jake)
wiser (joanne, jake)
older (joanne, jake)
根据这些命题,可以用解析法构造以下命题:
From these propositions, the following proposition can be constructed using resolution:
更明智(乔安妮,杰克) 母亲(乔安妮、杰克)
wiser(joanne, jake) mother(joanne, jake)
这种归结构造的机制很简单:将两个子句命题左侧的项进行“或”运算,得到新命题的左侧。然后将两个子句命题的右侧进行“与”运算,得到新命题的右侧。接下来,从两侧删除新命题中出现的任何项。当命题在一侧或两侧都有多个项时,该过程完全相同。新推断命题的左侧最初包含两个给定命题左侧的所有项。新的右侧以类似的方式构造。然后删除新命题两侧出现的项。例如,如果我们有
The mechanics of this resolution construction are simple: The terms of the left sides of the two clausal propositions are OR’d together to make the left side of the new proposition. Then the right sides of the two clausal propositions are AND’d together to get the right side of the new proposition. Next, any term that appears on both sides of the new proposition is removed from both sides. The process is exactly the same when the propositions have multiple terms on either or both sides. The left side of the new inferred proposition initially contains all of the terms of the left sides of the two given propositions. The new right side is similarly constructed. Then the term that appears on both sides of the new proposition is removed. For example, if we have
父亲(鲍勃,杰克) 母亲(鲍勃、杰克) 父母(鲍勃,杰克)
father(bob, jake) mother(bob, jake) parent(bob, jake)
祖父(鲍勃,弗雷德) 父亲(鲍勃,杰克) 父亲(杰克、弗雷德)
grandfather(bob, fred) father(bob, jake) father(jake, fred)
决议指出
resolution says that
母亲(鲍勃、杰克) 祖父(鲍勃,弗雷德)
mother(bob, jake) grandfather(bob, fred)
父母(鲍勃,杰克) 父亲(杰克、弗雷德)
parent(bob, jake) father(jake, fred)
它除了一个原子命题外,其余所有原子命题都包含在原始命题中。允许在第一个命题左侧和第二个命题右侧执行 father (bob, jake) 操作的一个原子命题被省略了。在英语中,我们会说
which has all but one of the atomic propositions of both of the original propositions. The one atomic proposition that allowed the operation father (bob, jake) in the left side of the first and in the right side of the second is left out. In English, we would say
解析实际上比这些简单的例子要复杂得多。具体来说,命题中变量的存在需要解析来找到那些变量的值,以使匹配过程成功。确定变量有用值的这一过程称为统一。临时将值分配给变量以允许统一的过程称为实例化。
Resolution is actually more complex than these simple examples illustrate. In particular, the presence of variables in propositions requires resolution to find values for those variables that allow the matching process to succeed. This process of determining useful values for variables is called unification. The temporary assigning of values to variables to allow unification is called instantiation.
解析过程经常会用一个值实例化一个变量,但无法完成所需的匹配,然后需要回溯并用不同的值实例化该变量。我们将在 Prolog 的背景下更广泛地讨论统一和回溯。
It is common for the resolution process to instantiate a variable with a value, fail to complete the required matching, and then be required to backtrack and instantiate the variable with a different value. We will discuss unification and backtracking more extensively in the context of Prolog.
解析的一个至关重要的特性是它能够检测给定命题集中的任何不一致之处。这是基于解析的形式属性,称为反驳完全性。这意味着给定一组不一致的命题,解析可以证明它们是不一致的。这允许解析用于证明定理,这可以作为如下:我们可以将定理证明设想为谓词演算中的一组给定的相关命题,定理本身的否定则被表述为一个新命题。定理被否定,因此可以通过找到不一致之处来使用解析来证明定理。这就是矛盾证明,是数学中经常使用的证明定理的方法。通常,原始命题称为假设,定理的否定称为目标。
A critically important property of resolution is its ability to detect any inconsistency in a given set of propositions. This is based on the formal property of resolution called refutation complete. What this means is that given a set of inconsistent propositions, resolution can prove them to be inconsistent. This allows resolution to be used to prove theorems, which can be done as follows: We can envision a theorem proof in terms of predicate calculus as a given set of pertinent propositions, with the negation of the theorem itself stated as a new proposition. The theorem is negated so that resolution can be used to prove the theorem by finding an inconsistency. This is proof by contradiction, a frequently used approach to proving theorems in mathematics. Typically, the original propositions are called the hypotheses, and the negation of the theorem is called the goal.
从理论上讲,这个过程是有效和有用的。然而,解决所需的时间可能是一个问题。虽然当命题集有限时,解决是一个有限的过程,但在大型命题数据库中查找不一致性所需的时间可能非常长。
Theoretically, this process is valid and useful. The time required for resolution, however, can be a problem. Although resolution is a finite process when the set of propositions is finite, the time required to find an inconsistency in a large database of propositions may be huge.
定理证明是逻辑编程的基础。大部分计算结果都可以用给定事实和关系列表的形式来表达,这些事实和关系是假设,而目标可以通过解析式从假设中推断出来。
Theorem proving is the basis for logic programming. Much of what is computed can be couched in the form of a list of given facts and relationships as hypotheses, and a goal to be inferred from the hypotheses, using resolution.
如果假设和目标是一般命题,即使它们是子句形式,解决它们通常也不切实际。虽然使用子句形式命题可以证明定理,但这可能无法在合理的时间内完成。简化解决过程的一种方法是限制命题的形式。一个有用的限制是要求命题为霍恩子句。霍恩子句只能是两种形式之一:它们要么在左侧有一个原子命题,要么左侧为空。1子句形式命题的左侧有时称为头部,左侧有霍恩子句的霍恩子句称为有头霍恩子句。有头霍恩子句用于陈述关系,例如
Resolution on a hypotheses and a goal that are general propositions, even if they are in clausal form, is often not practical. Although it may be possible to prove a theorem using clausal form propositions, it may not happen in a reasonable amount of time. One way to simplify the resolution process is to restrict the form of the propositions. One useful restriction is to require the propositions to be Horn clauses. Horn clauses only can be in one of two forms: They have either a single atomic proposition on the left side or an empty left side.1 The left side of a clausal form proposition is sometimes called the head, and Horn clauses with left sides are called headed Horn clauses. Headed Horn clauses are used to state relationships, such as
喜欢(鲍勃、鳟鱼)
喜欢(鲍勃、鱼)
鱼(鳟鱼)
likes(bob, trout)
likes(bob, fish)
fish(trout)
左边为空的 Horn 子句通常用于陈述事实,被称为无头 Horn 子句。例如,
Horn clauses with empty left sides, which are often used to state facts, are called headless Horn clauses. For example,
鲍勃神父(,杰克)
father(bob, jake)
大多数(但不是全部)命题都可以表述为霍恩子句。对霍恩子句的限制使得解析成为证明定理的一个实用过程。
Most, but not all, propositions can be stated as Horn clauses. The restriction to Horn clauses makes resolution a practical process for proving theorems.
用于逻辑编程的语言称为声明性语言,因为用它们编写的程序由声明而不是赋值和控制流语句组成。这些声明实际上是符号逻辑中的语句或命题。
Languages used for logic programming are called declarative languages, because programs written in them consist of declarations rather than assignments and control flow statements. These declarations are actually statements, or propositions, in symbolic logic.
逻辑编程语言的一个基本特征是其语义,称为声明性语义。这种语义的基本概念是,有一种简单的方法来确定每个语句的含义,并且它不依赖于语句如何用于解决问题。声明性语义比命令式语言的语义简单得多。例如,在逻辑编程语言中,可以从语句本身简洁地确定给定命题的含义。在命令式语言中,简单赋值语句的语义需要检查局部声明、了解语言的作用域规则,甚至可能需要检查其他文件中的程序,才能确定赋值语句中变量的类型。然后,假设赋值的表达式包含变量,则必须跟踪赋值语句之前的程序执行以确定这些变量的值。然后,语句的结果操作取决于其运行时上下文。与逻辑语言中命题的语义相比,声明式语义不需要考虑文本上下文或执行顺序,显然比命令式语言的语义简单得多。因此,声明式语义通常被认为是声明式语言相对于命令式语言的优势之一(Hogger,1984,第 240-241 页)。
One of the essential characteristics of logic programming languages is their semantics, which is called declarative semantics. The basic concept of this semantics is that there is a simple way to determine the meaning of each statement, and it does not depend on how the statement might be used to solve a problem. Declarative semantics is considerably simpler than the semantics of the imperative languages. For example, the meaning of a given proposition in a logic programming language can be concisely determined from the statement itself. In an imperative language, the semantics of a simple assignment statement requires examination of local declarations, knowledge of the scoping rules of the language, and possibly even examination of programs in other files just to determine the types of the variables in the assignment statement. Then, assuming the expression of the assignment contains variables, the execution of the program prior to the assignment statement must be traced to determine the values of those variables. The resulting action of the statement, then, depends on its run-time context. Comparing this semantics with that of a proposition in a logic language, with no need to consider textual context or execution sequences, it is clear that declarative semantics is far simpler than the semantics of imperative languages. Thus, declarative semantics is often stated as one of the advantages that declarative languages have over imperative languages (Hogger, 1984, pp. 240–241).
命令式和函数式语言的编程主要是程序性的,这意味着程序员知道程序要完成什么,并指示计算机如何进行计算。换句话说,计算机被视为一个服从命令的简单设备。所有计算都必须详细说明计算的细节。有些人认为,这就是使用命令式和函数式语言进行编程的困难之处。
Programming in both imperative and functional languages is primarily procedural, which means that the programmer knows what is to be accomplished by a program and instructs the computer on exactly how the computation is to be done. In other words, the computer is treated as a simple device that obeys orders. Everything that is computed must have every detail of that computation spelled out. Some believe that this is the essence of the difficulty of programming using imperative and functional languages.
逻辑编程语言中的编程是非过程化的。此类语言中的程序并不准确说明如何计算结果,而是描述结果的形式。不同之处在于,我们假设计算机系统可以以某种方式确定如何计算结果。为逻辑编程语言提供这种能力需要一种简洁的方法,为计算机提供相关信息和计算所需结果的推理方法。谓词演算为计算机提供了基本的通信形式,而解析则提供了推理技术。
Programming in a logic programming language is nonprocedural. Programs in such languages do not state exactly how a result is to be computed but rather describe the form of the result. The difference is that we assume the computer system can somehow determine how the result is to be computed. What is needed to provide this capability for logic programming languages is a concise means of supplying the computer with both the relevant information and a method of inference for computing desired results. Predicate calculus supplies the basic form of communication to the computer, and resolution provides the inference technique.
排序通常用于说明过程系统和非过程系统之间的区别。在 Java 等语言中,排序是通过在 Java 程序中向具有 Java 编译器的计算机解释某些排序算法的所有细节来完成的。计算机将 Java 程序翻译成机器代码或某些解释性中间代码后,按照指令生成排序列表。
Sorting is commonly used to illustrate the difference between procedural and nonprocedural systems. In a language like Java, sorting is done by explaining in a Java program all of the details of some sorting algorithm to a computer that has a Java compiler. The computer, after translating the Java program into machine code or some interpretive intermediate code, follows the instructions and produces the sorted list.
在非过程语言中,只需描述排序列表的特征:它是给定列表的某种排列,使得对于每对相邻元素,两个元素之间都存在给定的关系。为了正式说明这一点,假设要排序的列表位于名为 list 的数组中,该数组的下标范围为 1 . . . n。对给定列表 old_list 中的元素进行排序并将它们放在名为 new_list 的单独数组中的概念可以表示如下:
In a nonprocedural language, it is necessary only to describe the characteristics of the sorted list: It is some permutation of the given list such that for each pair of adjacent elements, a given relationship holds between the two elements. To state this formally, suppose the list to be sorted is in an array named list that has a subscript range 1 . . . n. The concept of sorting the elements of the given list, named old_list, and placing them in a separate array, named new_list, can then be expressed as follows:
排序(旧列表,新列表) 置换(旧列表,新列表) 已排序(新列表)
sort (old_list, new_list) permute (old_list, new_list) sorted (new_list)
已排序(列表) 使得 列表(j)
sorted (list) such that list(j)
其中 permute 是一个谓词,如果其第二个参数数组是其第一个参数数组的排列,则返回 true。
where permute is a predicate that returns true if its second parameter array is a permutation of its first parameter array.
根据上述描述,非过程化语言系统可以生成排序列表。这使得非过程化编程听起来就像是生成简明的软件需求规范,这是一个公平的评价。然而不幸的是,事情并没有那么简单。仅使用解析的逻辑程序面临严重的执行效率问题。在我们的排序示例中,如果列表很长,则排列的数量非常大,必须逐一生成和测试它们,直到找到有序的那个——这是一个非常漫长的过程。当然,必须考虑到逻辑语言的最佳形式可能尚未确定,而且尚未开发出用逻辑编程语言创建大型问题的程序的好方法。
From this description, the nonprocedural language system could produce the sorted list. That makes nonprocedural programming sound like the mere production of concise software requirements specifications, which is a fair assessment. Unfortunately, however, it is not that simple. Logic programs that use only resolution face serious problems of execution efficiency. In our example of sorting, if the list is long, the number of permutations is huge, and they must be generated and tested, one by one, until the one that is in order is found—a very lengthy process. Of course, one must consider the possibility that the best form of a logic language may not yet have been determined, and good methods of creating programs in logic programming languages for large problems have not yet been developed.
如第2章 所述,艾克斯-马赛大学的 Alain Colmerauer 和 Phillippe Roussel 在爱丁堡大学的 Robert Kowalski 的帮助下开发了 Prolog 的基本设计。Colmerauer 和 Roussel 对自然语言处理感兴趣;Kowalski 对自动定理证明感兴趣。艾克斯-马赛大学和爱丁堡大学的合作一直持续到 20 世纪 70 年代中期。从那时起,这两个地方对该语言的开发和使用的研究就独立进行,导致了 Prolog 的两种语法不同的方言。
As was stated in Chapter 2, Alain Colmerauer and Phillippe Roussel at the University of Aix-Marseille, with some assistance from Robert Kowalski at the University of Edinburgh, developed the fundamental design of Prolog. Colmerauer and Roussel were interested in natural-language processing; Kowalski was interested in automated theorem proving. The collaboration between the University of Aix-Marseille and the University of Edinburgh continued until the mid-1970s. Since then, research on the development and use of the language has progressed independently at those two locations, resulting in, among other things, two syntactically different dialects of Prolog.
Prolog 的开发和其他逻辑编程研究工作在爱丁堡和马赛以外地区没有受到太多关注,直到 1981 年日本政府宣布启动一项名为第五代计算机系统 (FGCS;Fuchi,1981;Moto-oka,1981 ) 的大型研究项目。该项目的主要目标之一是开发智能机器,而 Prolog 被选为这项工作的基础。FGCS 的宣布突然引起了美国和几个欧洲国家的研究人员以及政府对人工智能和逻辑编程的浓厚兴趣。
The development of Prolog and other research efforts in logic programming received limited attention outside of Edinburgh and Marseille until the announcement in 1981 that the Japanese government was launching a large research project called the Fifth Generation Computing Systems (FGCS; Fuchi, 1981; Moto-oka, 1981). One of the primary objectives of the project was to develop intelligent machines, and Prolog was chosen as the basis for this effort. The announcement of FGCS aroused a sudden strong interest in artificial intelligence and logic programming in researchers and the governments of the United States and several European countries.
经过十年的努力,FGCS 项目悄然被放弃。尽管逻辑编程和 Prolog 被认为具有巨大的潜力,但几乎没有发现什么重大的东西。这导致人们对 Prolog 的兴趣和使用率下降,尽管它仍然有支持者。
After a decade of effort, the FGCS project was quietly dropped. Despite the great assumed potential of logic programming and Prolog, little of great significance had been discovered. This led to the decline in the interest in and use of Prolog, although it still has its proponents.
现在 Prolog 有许多不同的方言。这些方言可以分为几类:源自马赛方言的方言、源自爱丁堡方言的方言,以及一些为微型计算机开发的方言,比如Clark 和 McCabe (1984)描述的微型 Prolog 。这些方言的句法形式有些不同。我们没有尝试描述几种 Prolog 方言或它们的混合方言的语法,而是选择了一种特定的、广泛使用的方言,即在爱丁堡开发的方言。这种语言形式有时称为爱丁堡语法。它首次实现是在 DEC System-10(Warren 等人,1979 年)。几乎所有流行的计算机平台都有 Prolog 实现,例如,来自自由软件组织(http://www.gnu.org)的实现。
There are now a number of different dialects of Prolog. These can be grouped into several categories: those that grew from the Marseille group, those that came from the Edinburgh group, and some dialects that have been developed for microcomputers, such as micro-Prolog, which is described by Clark and McCabe (1984). The syntactic forms of these are somewhat different. Rather than attempt to describe the syntax of several dialects of Prolog or some hybrid of them, we have chosen one particular, widely available dialect, which is the one developed at Edinburgh. This form of the language is sometimes called Edinburgh syntax. Its first implementation was on a DEC System-10 (Warren et al., 1979). Prolog implementations are available for virtually all popular computer platforms, for example, from the Free Software Organization (http://www.gnu.org).
与其他语言的程序一样,Prolog 程序由语句集合组成。Prolog 中的语句只有几种,但它们可能很复杂。所有 Prolog 语句以及 Prolog 数据都是由项构成的。
As with programs in other languages, Prolog programs consist of collections of statements. There are only a few kinds of statements in Prolog, but they can be complex. All Prolog statements, as well as Prolog data, are constructed from terms.
Prolog术语是常量、变量或结构。常量是原子或整数。原子是 Prolog 的符号值,与 LISP 中的原子类似。具体来说,原子是字母、数字和下划线组成的字符串,以小写字母开头,或以撇号分隔的任何可打印 ASCII 字符组成的字符串。
A Prolog term is a constant, a variable, or a structure. A constant is either an atom or an integer. Atoms are the symbolic values of Prolog and are similar to their counterparts in LISP. In particular, an atom is either a string of letters, digits, and underscores that begins with a lowercase letter or a string of any printable ASCII characters delimited by apostrophes.
变量是以大写字母或下划线 ( ) 开头的任意字母、数字和下划线字符串_。变量不通过声明绑定到类型。将值(进而将类型)绑定到变量称为 实例化。实例化仅发生在解析过程中。尚未赋值的变量称为未实例化。实例化仅持续到满足一个完整目标为止,这涉及一个命题的证明或反证。在语义和使用方面,Prolog 变量与命令式语言中的变量只是远亲。
A variable is any string of letters, digits, and underscores that begins with an uppercase letter or an underscore ( _ ). Variables are not bound to types by declarations. The binding of a value, and thus a type, to a variable is called an instantiation. Instantiation occurs only in the resolution process. A variable that has not been assigned a value is called uninstantiated. Instantiations last only as long as it takes to satisfy one complete goal, which involves the proof or disproof of one proposition. Prolog variables are only distant relatives, in terms of both semantics and use, to the variables in the imperative languages.
最后一种术语称为结构。结构表示谓词演算的原子命题,其一般形式如下:
The last kind of term is called a structure. Structures represent the atomic propositions of predicate calculus, and their general form is the same:
functor(parameter list)functor(parameter list)
函子是任意原子,用于标识结构。参数列表可以是任意原子、变量或其他结构的列表。正如下一小节中详细讨论的那样,结构是 Prolog 中指定事实的手段。它们也可以被视为对象,在这种情况下,它们允许用几个相关原子来陈述事实。从这个意义上讲,结构是关系,因为它们陈述了术语之间的关系。当结构的上下文将其指定为查询(问题)时,它也是谓词。
The functor is any atom and is used to identify the structure. The parameter list can be any list of atoms, variables, or other structures. As discussed at length in the following subsection, structures are the means of specifying facts in Prolog. They can also be thought of as objects, in which case they allow facts to be stated in terms of several related atoms. In this sense, structures are relations, for they state relationships among terms. A structure is also a predicate when its context specifies it to be a query (question).
我们对 Prolog 语句的讨论始于用于构建假设或假设信息数据库的语句 - 从这些语句可以推断出新的信息。
Our discussion of Prolog statements begins with those statements used to construct the hypotheses, or database of assumed information—the statements from which new information can be inferred.
Prolog 有两种基本语句形式;它们对应于谓词演算的无头 Horn 子句和有头 Horn 子句。Prolog 中最简单的无头 Horn 子句形式是单个结构,它被解释为无条件断言或事实。从逻辑上讲,事实只是假定为真的命题。
Prolog has two basic statement forms; these correspond to the headless and headed Horn clauses of predicate calculus. The simplest form of headless Horn clause in Prolog is a single structure, which is interpreted as an unconditional assertion, or fact. Logically, facts are simply propositions that are assumed to be true.
以下示例说明了 Prolog 程序中可以包含哪些类型的事实。请注意,每个 Prolog 语句都以句点结尾。
The following examples illustrate the kinds of facts one can have in a Prolog program. Notice that every Prolog statement is terminated by a period.
female(shelley).
male(bill).
female(mary).
male(jake).
father(bill, jake).
father(bill, shelley).
mother(mary, jake).
mother(mary, shelley).
female(shelley).
male(bill).
female(mary).
male(jake).
father(bill, jake).
father(bill, shelley).
mother(mary, jake).
mother(mary, shelley).
这些简单结构陈述了关于jake、shelley、bill和 的某些事实mary。例如,第一个结构陈述shelley是female。后四个结构将它们的两个参数与函子原子中命名的关系联系起来;例如,第五个命题可能被解释为 是bill的father。jake请注意,这些 Prolog 命题,就像谓词演算的命题一样,没有内在语义。它们的含义由程序员决定。例如,命题
These simple structures state certain facts about jake, shelley, bill, and mary. For example, the first states that shelley is a female. The last four connect their two parameters with a relationship that is named in the functor atom; for example, the fifth proposition might be interpreted to mean that bill is the father of jake. Note that these Prolog propositions, like those of predicate calculus, have no intrinsic semantics. They mean whatever the programmer wants them to mean. For example, the proposition
father(bill, jake).father(bill, jake).
可能意味着bill和jake具有相同的含义father,或者jake是father的bill。然而,最常见和最直接的含义可能是bill的。fatherjake
could mean bill and jake have the same father or that jake is the father of bill. The most common and straightforward meaning, however, might be that bill is the father of jake.
用于构建数据库的 Prolog 语句的另一种基本形式对应于带头的 Horn 子句。此形式可以与数学中的一个已知定理相关联,如果满足给定的一组条件,则可以从中得出结论。右侧是前件,或if部分,左侧是后件,或then部分。如果 Prolog 语句的前件为真,则该语句的后件也必须为真。因为它们是 Horn 子句,所以 Prolog 语句的后件是一个单个项,而前件可以是单个项或连词。
The other basic form of Prolog statement for constructing the database corresponds to a headed Horn clause. This form can be related to a known theorem in mathematics from which a conclusion can be drawn if the set of given conditions is satisfied. The right side is the antecedent, or if part, and the left side is the consequent, or then part. If the antecedent of a Prolog statement is true, then the consequent of the statement must also be true. Because they are Horn clauses, the consequent of a Prolog statement is a single term, while the antecedent can be either a single term or a conjunction.
连词包含多个由逻辑 AND 运算分隔的词项。在 Prolog 中,AND 运算是隐含的。连词中指定原子命题的结构由逗号分隔,因此可以将逗号视为 AND 运算符。作为连词的示例,请考虑以下内容:
Conjunctions contain multiple terms that are separated by logical AND operations. In Prolog, the AND operation is implied. The structures that specify atomic propositions in a conjunction are separated by commas, so one could consider the commas to be AND operators. As an example of a conjunction, consider the following:
female(shelley), child(shelley).female(shelley), child(shelley).
Prolog 头部 Horn 子句语句的一般形式为
The general form of the Prolog headed Horn clause statement is
consequence :- antecedent_expression.consequence :- antecedent_expression.
其内容如下:“如果先行表达式为真,或者可以通过其变量的某些实例化使其成为真,则可以得出结果。”例如,
It is read as follows: “consequence can be concluded if the antecedent expression is true or can be made to be true by some instantiation of its variables.” For example,
ancestor(mary, shelley) :- mother(mary, shelley).ancestor(mary, shelley) :- mother(mary, shelley).
规定如果mary是mother的shelley,那么mary就是ancestor的shelley。有头角子句被称为规则,因为它们规定了命题之间的蕴涵规则。
states that if mary is the mother of shelley, then mary is an ancestor of shelley. Headed Horn clauses are called rules, because they state rules of implication between propositions.
与谓词演算中的子句形式命题一样,Prolog 语句可以使用变量来概括其含义。回想一下,子句形式的变量提供了一种隐含的全称量词。以下演示了变量在 Prolog 语句中的使用:
As with clausal form propositions in predicate calculus, Prolog statements can use variables to generalize their meaning. Recall that variables in clausal form provide a kind of implied universal quantifier. The following demonstrates the use of variables in Prolog statements:
parent(X, Y) :- mother(X, Y).
parent(X, Y) :- father(X, Y).
grandparent(X, Z) :- parent(X, Y) , parent(Y, Z).
parent(X, Y) :- mother(X, Y).
parent(X, Y) :- father(X, Y).
grandparent(X, Z) :- parent(X, Y) , parent(Y, Z).
这些陈述给出了一些变量或通用对象之间的蕴涵规则。在这种情况下,通用对象是X、Y和Z。第一条规则指出,如果存在和的实例,X并且Y为真,则对于和mother(X, Y)的相同实例,为真。XYparent(X, Y)
These statements give rules of implication among some variables, or universal objects. In this case, the universal objects are X, Y, and Z. The first rule states that if there are instantiations of X and Y such that mother(X, Y) is true, then for those same instantiations of X and Y, parent(X, Y) is true.
=运算符是中缀运算符,如果其两个项操作数相同,则成功。例如,X = Y。运算not符是一元运算符,它会反转其操作数,也就是说,如果其操作数失败,则成功。例如,not(X = Y)如果X不等于,则成功Y。
The = operator, which is an infix operator, succeeds if its two term operands are the same. For example, X = Y. The not operator, which is a unary operator, reverses its operand, in the sense that it succeeds if its operand fails. For example, not(X = Y) succeeds if X is not equal to Y.
到目前为止,我们已经描述了逻辑命题的 Prolog 语句,这些语句用于描述已知事实和描述事实之间逻辑关系的规则。这些语句是定理证明模型的基础。定理以命题的形式出现,我们希望系统证明或反驳它。在 Prolog 中,这些命题称为目标或查询。Prolog 目标语句的句法形式与无头 Horn 子句的句法形式相同。例如,我们可以有
So far, we have described the Prolog statements for logical propositions, which are used to describe both known facts and rules that describe logical relationships among facts. These statements are the basis for the theorem-proving model. The theorem is in the form of a proposition that we want the system to either prove or disprove. In Prolog, these propositions are called goals, or queries. The syntactic form of Prolog goal statements is identical to that of headless Horn clauses. For example, we could have
man(fred).man(fred).
系统将对此做出响应yes。no答案yes意味着系统已证明该目标在给定的事实数据库下是真实的,并且关系。答案no意味着要么目标被确定为错误,要么系统根本无法证明这一点。
to which the system will respond either yes or no. The answer yes means that the system has proved the goal was true under the given database of facts and relationships. The answer no means that either the goal was determined to be false or the system was simply unable to prove it.
连接命题和带有变量的命题也是合法目标。当存在变量时,系统不仅会断言目标的有效性,还会识别使目标成立的变量的实例。例如,
Conjunctive propositions and propositions with variables are also legal goals. When variables are present, the system not only asserts the validity of the goal but also identifies the instantiations of the variables that make the goal true. For example,
father(X, mike).father(X, mike).
可以询问。然后系统将尝试通过统一来找到X导致目标真实值的实例。
can be asked. The system will then attempt, through unification, to find an instantiation of X that results in a true value for the goal.
由于目标语句和一些非目标语句具有相同的形式(无头 Horn 子句),因此 Prolog 实现必须有某种方法来区分这两者。交互式 Prolog 实现通过简单地提供两种模式来实现这一点,这两种模式由不同的交互式提示指示:一种用于输入事实和规则语句,一种用于输入目标。用户可以随时更改模式。
Because goal statements and some nongoal statements have the same form (headless Horn clauses), a Prolog implementation must have some means of distinguishing between the two. Interactive Prolog implementations do this by simply having two modes, indicated by different interactive prompts: one for entering fact and rule statements and one for entering goals. The user can change the mode at any time.
本节介绍 Prolog 解析。高效使用 Prolog 要求程序员准确了解 Prolog 系统对其程序的操作。
This section examines Prolog resolution. Efficient use of Prolog requires that the programmer know precisely what the Prolog system does with his or her program.
当目标是复合命题时,每个事实(结构)称为子目标。为了证明目标为真,推理过程必须在数据库中找到一系列推理规则和/或事实,将目标与数据库中的一个或多个事实联系起来。例如,如果Q是目标,那么要么必须在数据库中找到Q作为事实,要么推理过程必须找到一个事实 以及一系列命题 使得
When a goal is a compound proposition, each of the facts (structures) is called a subgoal. To prove that a goal is true, the inferencing process must find a chain of inference rules and/or facts in the database that connect the goal to one or more facts in the database. For example, if Q is the goal, then either Q must be found as a fact in the database or the inferencing process must find a fact and a sequence of propositions such that
:-
:-
:-
:-
...
. . .
问 :-
Q :-
当然,这个过程可能而且经常会因为规则与变量的复合右侧规则而变得复杂。当 P 存在时,寻找Pand的过程基本上就是对术语进行相互比较或匹配。
Of course, the process can be and often is complicated by rules with compound right sides and rules with variables. The process of finding the Ps, when they exist, is basically a comparison, or matching, of terms with each other.
因为证明子目标的过程是通过命题匹配过程完成的,所以有时也称为匹配。在某些情况下,证明子目标称为满足该子目标。
Because the process of proving a subgoal is done through a proposition-matching process, it is sometimes called matching. In some cases, proving a subgoal is called satisfying that subgoal.
考虑以下查询:
Consider the following query:
man(bob).man(bob).
这种目标陈述是最简单的一种。解析相对容易确定它是真还是假:将此目标的模式与数据库中的事实和规则进行比较。如果数据库中包含事实
This goal statement is the simplest kind. It is relatively easy for resolution to determine whether it is true or false: The pattern of this goal is compared with the facts and rules in the database. If the database includes the fact
man(bob).man(bob).
证明很简单。但是,如果数据库包含以下事实和推理规则,
the proof is trivial. If, however, the database contains the following fact and inference rule,
father(bob).
man(X) :- father(X).
father(bob).
man(X) :- father(X).
Prolog 需要找到这两个语句并使用它们来推断目标的真实性。这将需要统一来X暂时实例化bob。
Prolog would be required to find these two statements and use them to infer the truth of the goal. This would necessitate unification to instantiate X temporarily to bob.
现在考虑目标
Now consider the goal
man(X).man(X).
在这种情况下,Prolog 必须将目标与数据库中的命题进行匹配。它找到的第一个具有目标形式的命题,以任何对象作为其参数,将导致X使用该对象的值进行实例化。X然后显示结果。如果没有具有目标形式的命题,系统会通过说来表示no目标无法满足。
In this case, Prolog must match the goal against the propositions in the database. The first proposition that it finds that has the form of the goal, with any object as its parameter, will cause X to be instantiated with that object’s value. X is then displayed as the result. If there is no proposition having the form of the goal, the system indicates, by saying no, that the goal cannot be satisfied.
尝试将给定目标与数据库中的事实匹配有两种相反的方法。系统可以从数据库的事实和规则开始,并尝试找到导致目标的一系列匹配。这种方法称为自下而上的解析或前向链接。另一种方法是从目标开始,并尝试找到导致数据库中某些原始事实集的一系列匹配命题。这种方法称为自上而下的解析或后向链接。通常,当候选答案集相当小的时候,后向链接效果很好。当可能正确的答案数量很大时,前向链接方法更好;在这种情况下,后向链接需要非常大量的匹配才能得到答案。Prolog 实现使用后向链接进行解析,大概是因为其设计者认为后向链接比前向链接更适合解决更大类的问题。
There are two opposite approaches to attempting to match a given goal to a fact in the database. The system can begin with the facts and rules of the database and attempt to find a sequence of matches that lead to the goal. This approach is called bottom-up resolution, or forward chaining. The alternative is to begin with the goal and attempt to find a sequence of matching propositions that lead to some set of original facts in the database. This approach is called top-down resolution, or backward chaining. In general, backward chaining works well when there is a reasonably small set of candidate answers. The forward chaining approach is better when the number of possibly correct answers is large; in this situation, backward chaining would require a very large number of matches to get to an answer. Prolog implementations use backward chaining for resolution, presumably because its designers believed backward chaining was more suitable for a larger class of problems than forward chaining.
以下示例说明了正向链接和反向链接之间的区别。考虑以下查询:
The following example illustrates the difference between forward and backward chaining. Consider the query:
man(bob).man(bob).
假设数据库包含
Assume the database contains
father(bob).
man(X) :- father(X).
father(bob).
man(X) :- father(X).
正向链接将搜索并找到第一个命题。然后通过将第一个命题与第二个规则的右侧 ( father(X)) 进行实例化,然后将第二个命题的左侧与目标进行匹配,从而推断出目标。反向链接将首先Xbobman(X)通过将 实例化为 ,X将目标与第二个命题的左侧 ( ) 进行匹配bob。作为最后一步,它将第二个命题的右侧 (现在father(bob)) 与第一个命题进行匹配。
Forward chaining would search for and find the first proposition. The goal is then inferred by matching the first proposition with the right side of the second rule (father(X)) through instantiation of X to bob and then matching the left side of the second proposition to the goal. Backward chaining would first match the goal with the left side of the second proposition (man(X)) through the instantiation of X to bob. As its last step, it would match the right side of the second proposition (now father(bob)) with the first proposition.
下一个设计问题出现在目标具有多个结构时,就像我们的例子一样。那么问题就是解决方案搜索是深度优先还是广度优先。深度优先搜索在处理其他子目标之前,先找到第一个子目标的完整命题序列(证明)。广度优先搜索并行处理给定目标的所有子目标。Prolog 的设计者选择深度优先方法主要是因为它可以用更少的计算机资源完成。广度优先方法是一种并行搜索,可能需要大量内存。
The next design question arises whenever the goal has more than one structure, as in our example. The question then is whether the solution search is done depth first or breadth first. A depth-first search finds a complete sequence of propositions—a proof—for the first subgoal before working on the others. A breadth-first search works on all subgoals of a given goal in parallel. Prolog’s designers chose the depth-first approach primarily because it can be done with fewer computer resources. The breadth-first approach is a parallel search that can require a large amount of memory.
Prolog 解决机制中最后一个必须讨论的特性是回溯。当处理具有多个子目标的目标时,如果系统无法证明其中一个子目标的真实性,则系统将放弃无法证明的子目标。然后,系统重新考虑前一个子目标(如果有),并尝试找到它的替代解决方案。这种在目标中回溯到重新考虑先前已证明的子目标的过程称为回溯。通过从先前对该子目标的搜索停止的地方开始搜索,可以找到新的解决方案。子目标的多个解决方案来自其变量的不同实例。回溯可能需要大量的时间和空间,因为它可能必须找到每个子目标的所有可能证明。这些子目标证明可能没有组织起来以尽量减少找到最终完整证明所需的时间,这会使问题更加严重。
The last feature of Prolog’s resolution mechanism that must be discussed is backtracking. When a goal with multiple subgoals is being processed and the system fails to show the truth of one of the subgoals, the system abandons the subgoal it cannot prove. It then reconsiders the previous subgoal, if there is one, and attempts to find an alternative solution to it. This backing up in the goal to the reconsideration of a previously proven subgoal is called backtracking. A new solution is found by beginning the search where the previous search for that subgoal stopped. Multiple solutions to a subgoal result from different instantiations of its variables. Backtracking can require a great deal of time and space because it may have to find all possible proofs to every subgoal. These subgoal proofs may not be organized to minimize the time required to find the one that will result in the final complete proof, which exacerbates the problem.
为了巩固您对回溯的理解,请考虑以下示例。假设数据库中有一组事实和规则,并且 Prolog 已提供以下复合目标:
To solidify your understanding of backtracking, consider the following example. Assume that there is a set of facts and rules in the database and that Prolog has been presented with the following compound goal:
male(X), parent(X, shelley).male(X), parent(X, shelley).
此目标询问是否存在 的实例,X使得X是male并且X是 的parent的shelley。作为其第一步,Prolog 在数据库中找到第一个以male作为其函子的事实。然后,它将 实例化为X找到的事实的参数,即mike。然后,它尝试证明 为parent(mike, shelley)真。如果失败,它将回溯到第一个子目标 ,male(X)并尝试使用 的某个替代实例重新满足它X。解析过程可能必须male先在数据库中找到每个 ,然后才能找到 的 的parent。shelley它肯定必须找到所有male才能证明无法满足目标。请注意,如果两个子目标的顺序被反转,我们的示例目标可能会得到更有效的处理。然后,只有在解析找到 之后,parent它shelley才会尝试将该人与male子目标匹配。如果shelley的父母少于male数据库中的 ,则这种方法更有效,这似乎是一个合理的假设。第16.7.1节 讨论了一种限制 Prolog 系统执行的回溯的方法。
This goal asks whether there is an instantiation of X such that X is a male and X is a parent of shelley. As its first step, Prolog finds the first fact in the database with male as its functor. It then instantiates X to the parameter of the found fact, say mike. Then, it attempts to prove that parent(mike, shelley) is true. If it fails, it backtracks to the first subgoal, male(X), and attempts to resatisfy it with some alternative instantiation of X. The resolution process may have to find every male in the database before it finds the one that is a parent of shelley. It definitely must find all males to prove that the goal cannot be satisfied. Note that our example goal might be processed more efficiently if the order of the two subgoals were reversed. Then, only after resolution had found a parent of shelley would it try to match that person with the male subgoal. This is more efficient if shelley has fewer parents than there are males in the database, which seems like a reasonable assumption. Section 16.7.1 discusses a method of limiting the backtracking done by a Prolog system.
Prolog 中的数据库搜索总是按照从头到尾的方向进行。
Database searches in Prolog always proceed in the direction of first to last.
以下两小节描述了Prolog示例,进一步说明了解析过程。
The following two subsections describe Prolog examples that further illustrate the resolution process.
Prolog 支持整数变量和整数算术。最初,算术运算符是函子,因此7和变量的X和由
Prolog supports integer variables and integer arithmetic. Originally, the arithmetic operators were functors, so that the sum of 7 and the variable X was formed with
+(7, X)+(7, X)
Prolog 现在允许使用is运算符进行更简化的算术语法。此运算符将算术表达式作为其右操作数,将变量作为其左操作数。表达式中的所有变量都必须已实例化,但左侧变量不能先前实例化。例如,在
Prolog now allows a more abbreviated syntax for arithmetic with the is operator. This operator takes an arithmetic expression as its right operand and a variable as its left operand. All variables in the expression must already be instantiated, but the left-side variable cannot be previously instantiated. For example, in
A is B / 17 + C.A is B / 17 + C.
如果B和C被实例化但A不是,则此子句将导致A使用表达式的值进行实例化。发生这种情况时,子句得到满足。如果或B未被C实例化或被A实例化,则子句不满足,并且不会A发生实例化。命题的语义is与命令式语言中的赋值语句的语义有很大不同。这种差异可能导致一个有趣的场景。由于is运算符使它出现的子句看起来像赋值语句,因此初学 Prolog 的程序员可能会倾向于编写如下语句
if B and C are instantiated but A is not, then this clause will cause A to be instantiated with the value of the expression. When this happens, the clause is satisfied. If either B or C is not instantiated or A is instantiated, the clause is not satisfied and no instantiation of A can take place. The semantics of an is proposition is considerably different from that of an assignment statement in an imperative language. This difference can lead to an interesting scenario. Because the is operator makes the clause in which it appears look like an assignment statement, a beginning Prolog programmer may be tempted to write a statement such as
Sum is Sum + Number.Sum is Sum + Number.
这在 Prolog 中永远没有用,甚至不合法。如果Sum未实例化,则右侧对它的引用未定义,并且子句失败。如果Sum已实例化,则子句失败,因为在评估时左操作数不能有当前实例化is。在任一情况下,都不会将实例化为Sum新值。(如果需要的值Sum + Number,可以将其绑定到某个新名称。)
which is never useful, or even legal, in Prolog. If Sum is not instantiated, the reference to it in the right side is undefined and the clause fails. If Sum is already instantiated, the clause fails, because the left operand cannot have a current instantiation when is is evaluated. In either case, the instantiation of Sum to the new value will not take place. (If the value of Sum + Number is required, it can be bound to some new name.)
Prolog 不像命令式语言那样具有赋值语句。在大多数 Prolog 所针对的编程中,赋值语句根本就不需要。在命令式语言中,赋值语句的实用性通常取决于程序员控制赋值语句所嵌入代码的执行控制流的能力。由于这种类型的控制在 Prolog 中并不总是可行的,因此这类语句的实用性要小得多。
Prolog does not have assignment statements in the same sense as imperative languages. They are simply not needed in most of the programming for which Prolog was designed. The usefulness of assignment statements in imperative languages often depends on the capability of the programmer to control the execution control flow of the code in which the assignment statement is embedded. Because this type of control is not always possible in Prolog, such statements are far less useful.
作为 Prolog 中使用数值计算的一个简单示例,请考虑以下问题:假设我们知道几个汽车在特定赛道上行驶的时间和行驶时间。这些基本信息可以编码为事实,速度、时间和距离之间的关系可以写成规则,如下所示:
As a simple example of the use of numeric computation in Prolog, consider the following problem: Suppose we know the average speeds of several automobiles on a particular racetrack and the amount of time they are on the track. This basic information can be coded as facts, and the relationship between speed, time, and distance can be written as a rule, as in the following:
speed(ford, 100).
speed(chevy, 105).
speed(dodge, 95).
speed(volvo, 80).
time(ford, 20).
time(chevy, 21).
time(dodge, 24).
time(volvo, 24).
distance(X, Y) :- speed(X, Speed),
time(X, Time),
Y is Speed * Time.
speed(ford, 100).
speed(chevy, 105).
speed(dodge, 95).
speed(volvo, 80).
time(ford, 20).
time(chevy, 21).
time(dodge, 24).
time(volvo, 24).
distance(X, Y) :- speed(X, Speed),
time(X, Time),
Y is Speed * Time.
现在,查询可以请求特定车辆行驶的距离。例如,查询
Now, queries can request the distance traveled by a particular car. For example, the query
distance(chevy, Chevy_Distance).distance(chevy, Chevy_Distance).
用值 2205实例化Chevy_Distance。距离计算语句右侧的前两个子句用给定汽车函子的相应值实例化变量Speed和Time。在满足目标后,Prolog 还会显示名称Chevy_Distance及其值。
instantiates Chevy_Distance with the value 2205. The first two clauses in the right side of the distance computation statement instantiate the variables Speed and Time with the corresponding values of the given automobile functor. After satisfying the goal, Prolog also displays the name Chevy_Distance and its value.
此时,从操作角度了解 Prolog 系统如何产生结果是有益的。Prolog 有一个内置结构,名为trace,它显示在尝试满足给定目标的每个步骤中变量值的实例化。trace用于理解和调试 Prolog 程序。为了理解trace,最好介绍 Prolog 程序执行的不同模型,称为跟踪模型。
At this point it is instructive to take an operational look at how a Prolog system produces results. Prolog has a built-in structure named trace that displays the instantiations of values to variables at each step during the attempt to satisfy a given goal. trace is used to understand and debug Prolog programs. To understand trace, it is best to introduce a different model of the execution of Prolog programs, called the tracing model.
跟踪模型用四个事件来描述 Prolog 的执行:(1) 调用,发生在开始尝试满足目标时,(2) 退出,发生在目标已经满足时,(3) 重做,发生在回溯导致尝试重新满足目标时,以及 (4) 失败,发生在目标失败时。如果将诸如的过程视为子程序,则调用和退出可以直接与命令式语言中子程序的执行模型相关。distance另外两个事件是逻辑编程系统所独有的。在下面的跟踪示例中,对目标的 值的计算的跟踪Chevy_Distance不需要重做或失败事件:
The tracing model describes Prolog execution in terms of four events: (1) call, which occurs at the beginning of an attempt to satisfy a goal, (2) exit, which occurs when a goal has been satisfied, (3) redo, which occurs when backtrack causes an attempt to resatisfy a goal, and (4) fail, which occurs when a goal fails. Call and exit can be related directly to the execution model of a subprogram in an imperative language if processes like distance are thought of as subprograms. The other two events are unique to logic programming systems. In the following trace example, a trace of the computation of the value for Chevy_Distance, the goal requires no redo or fail events:
trace.
distance(chevy, Chevy_Distance).
(1) 1 Call: distance(chevy, _0)?
(2) 2 Call: speed(chevy, _5)?
(2) 2 Exit: speed(chevy, 105)
(3) 2 Call: time(chevy, _6)?
(3) 2 Exit: time(chevy, 21)
(4) 2 Call: _0 is 105*21?
(4) 2 Exit: 2205 is 105*21
(1) 1 Exit: distance(chevy, 2205)
Chevy_Distance = 2205
trace.
distance(chevy, Chevy_Distance).
(1) 1 Call: distance(chevy, _0)?
(2) 2 Call: speed(chevy, _5)?
(2) 2 Exit: speed(chevy, 105)
(3) 2 Call: time(chevy, _6)?
(3) 2 Exit: time(chevy, 21)
(4) 2 Call: _0 is 105*21?
(4) 2 Exit: 2205 is 105*21
(1) 1 Exit: distance(chevy, 2205)
Chevy_Distance = 2205
跟踪中以下划线字符 ( _) 开头的符号是用于存储实例化值的内部变量。跟踪的第一列表示当前正在尝试匹配的子目标。例如,在示例跟踪中,带有指示的第一行是尝试用 的值实例(3)化临时变量,其中是描述 计算的语句右侧的第二项。第二列表示匹配过程的调用深度。第三列表示当前操作。_6timechevytimedistance
Symbols in the trace that begin with the underscore character ( _ ) are internal variables used to store instantiated values. The first column of the trace indicates the subgoal whose match is currently being attempted. For example, in the example trace, the first line with the indication (3) is an attempt to instantiate the temporary variable _6 with a time value for chevy, where time is the second term in the right side of the statement that describes the computation of distance. The second column indicates the call depth of the matching process. The third column indicates the current action.
为了说明回溯,请考虑以下示例数据库和跟踪的复合目标:
To illustrate backtracking, consider the following example database and traced compound goal:
likes(jake, chocolate).
likes(jake, apricots).
likes(darcie, licorice).
likes(darcie, apricots).
trace.
likes(jake, X), likes(darcie, X).
(1) 1 Call: likes(jake, _0)?
(1) 1 Exit: likes(jake, chocolate)
(2) 1 Call: likes(darcie, chocolate)?
(2) 1 Fail: likes(darcie, chocolate)
(1) 1 Redo: likes(jake, _0)?
(1) 1 Exit: likes(jake, apricots)
(3) 1 Call: likes(darcie, apricots)?
(3) 1 Exit: likes(darcie, apricots)
X = apricots
likes(jake, chocolate).
likes(jake, apricots).
likes(darcie, licorice).
likes(darcie, apricots).
trace.
likes(jake, X), likes(darcie, X).
(1) 1 Call: likes(jake, _0)?
(1) 1 Exit: likes(jake, chocolate)
(2) 1 Call: likes(darcie, chocolate)?
(2) 1 Fail: likes(darcie, chocolate)
(1) 1 Redo: likes(jake, _0)?
(1) 1 Exit: likes(jake, apricots)
(3) 1 Call: likes(darcie, apricots)?
(3) 1 Exit: likes(darcie, apricots)
X = apricots
我们可以如下以图形方式思考 Prolog 计算:将每个目标视为一个具有四个端口(调用、失败、退出和重做)的框。控制通过其调用端口以正向进入目标。控制还可以通过其重做端口从反向进入目标。控制还可以通过两种方式离开目标:如果目标成功,控制通过退出端口离开;如果目标失败,控制通过失败端口离开。示例模型如图16.1所示。在此示例中,控制流经每个子目标两次。第二个子目标第一次失败,这迫使通过重做返回到第一个子目标。
One can think about Prolog computations graphically as follows: Consider each goal as a box with four ports—call, fail, exit, and redo. Control enters a goal in the forward direction through its call port. Control can also enter a goal from the reverse direction through its redo port. Control can also leave a goal in two ways: If the goal succeeded, control leaves through the exit port; if the goal failed, control leaves through the fail port. A model of the example is shown in Figure 16.1. In this example, control flows through each subgoal twice. The second subgoal fails the first time, which forces a return through redo to the first subgoal.
likes (jake, X), likes (darcie, X)likes (jake, X), likes (darcie, X)到目前为止,我们讨论过的唯一 Prolog 数据结构是原子命题,它看起来更像是函数调用而不是数据结构。原子命题也称为结构,实际上是一种记录形式。支持的另一种基本数据结构是列表。列表是任意数量元素的序列,其中元素可以是原子、原子命题或任何其他术语,包括其他列表。
So far, the only Prolog data structure we have discussed is the atomic proposition, which looks more like a function call than a data structure. Atomic propositions, which are also called structures, are actually a form of records. The other basic data structure supported is the list. Lists are sequences of any number of elements, where the elements can be atoms, atomic propositions, or any other terms, including other lists.
Prolog 使用 ML 和 Haskell 的语法来指定列表。列表元素用逗号分隔,整个列表用方括号分隔,如下所示
Prolog uses the syntax of ML and Haskell to specify lists. The list elements are separated by commas, and the entire list is delimited by square brackets, as in
[apple, prune, grape, kumquat][apple, prune, grape, kumquat]
符号[]用于表示空列表。Prolog 没有明确的函数来构造和拆除列表,而是使用特殊的符号。表示带有头和尾[X | Y]的列表,其中头和尾对应于LISP 中的和。这类似于 ML 和 Haskell 中使用的符号。XYCARCDR
The notation [] is used to denote the empty list. Instead of having explicit functions for constructing and dismantling lists, Prolog simply uses a special notation. [X | Y] denotes a list with head X and tail Y, where head and tail correspond to CAR and CDR in LISP. This is similar to the notation used in ML and Haskell.
可以创建一个具有简单结构的列表,例如
A list can be created with a simple structure, as in
new_list([apple, prune, grape, kumquat]).new_list([apple, prune, grape, kumquat]).
这表明常量列表[apple, prune, grape, kumquat]是关系的新元素,名为new_list(我们刚刚编造的名称)。此语句不会将列表绑定到名为的变量new_list;相反,它执行命题
which states that the constant list [apple, prune, grape, kumquat] is a new element of the relation named new_list (a name we just made up). This statement does not bind the list to a variable named new_list; rather, it does the kind of thing that the proposition
male(jake)male(jake)
确实如此。也就是说,它表明[apple, prune, grape, kumquat]是 的一个新元素new_list。因此,我们可以有第二个带有列表参数的命题,例如
does. That is, it states that [apple, prune, grape, kumquat] is a new element of new_list. Therefore, we could have a second proposition with a list argument, such as
new_list([apricot, peach, pear])new_list([apricot, peach, pear])
在查询模式下, 的一个元素new_list可以拆解为 head 和 tail ,方法是
In query mode, one of the elements of new_list can be dismantled into head and tail with
new_list([New_List_Head | New_List_Tail]).new_list([New_List_Head | New_List_Tail]).
如果new_list已设置为具有所示的两个元素,则此语句将New_List_Head使用第一个列表元素的头部(在本例中为apple)和New_List_Tail列表的尾部(或[prune, grape, kumquat])进行实例化。如果这是复合目标的一部分,并且回溯强制对其进行新的评估,New_List_Head则和New_List_Tail将分别重新实例化为apricot和[peach, pear],因为[apricot, peach, pear]是的下一个元素new_list。
If new_list has been set to have the two elements as shown, this statement instantiates New_List_Head with the head of the first list element (in this case, apple) and New_List_Tail with the tail of the list (or [prune, grape, kumquat]). If this were part of a compound goal and backtracking forced a new evaluation of it, New_List_Head and New_List_Tail would be reinstantiated to apricot and [peach, pear], respectively, because [apricot, peach, pear] is the next element of new_list.
用于拆解列表的运算符|也可用于从给定实例化的头和尾组件创建列表,如下所示
The | operator used to dismantle lists can also be used to create lists from given instantiated head and tail components, as in
[Element_1 | List_2][Element_1 | List_2]
如果Element_1已用 实例化pickle并且List_2已用 实例化[peanut, prune, popcorn],则示例符号将为此一个引用创建列表[pickle, peanut, prune, popcorn]。
If Element_1 has been instantiated with pickle and List_2 has been instantiated with [peanut, prune, popcorn], the sample notation will create, for this one reference, the list [pickle, peanut, prune, popcorn].
如前所述,包含|符号的列表符号是通用的:它可以指定列表构造或列表分解。进一步注意,以下内容是等效的:
As stated previously, the list notation that includes the | symbol is universal: It can specify either a list construction or a list dismantling. Note further that the following are equivalent:
[apricot, peach, pear | []]
[apricot, peach | [pear]]
[apricot | [peach, pear]]
[apricot, peach, pear | []]
[apricot, peach | [pear]]
[apricot | [peach, pear]]
对于列表,通常需要某些基本操作,例如 LISP、ML 和 Haskell 中的操作。作为 Prolog 中此类操作的一个例子,我们研究了 的定义append,它与 LISP 中的此类函数相关。在这个例子中,可以看到函数式语言和声明式语言之间的差异和相似之处。我们不需要指定 Prolog 如何从给定的列表构造新列表;相反,我们只需要根据给定的列表指定新列表的特征。
With lists, certain basic operations are often required, such as those found in LISP, ML, and Haskell. As an example of such operations in Prolog, we examine a definition of append, which is related to such a function in LISP. In this example, the differences and similarities between functional and declarative languages can be seen. We need not specify how Prolog is to construct a new list from the given lists; rather, we need specify only the characteristics of the new list in terms of the given lists.
从外观上看,Prolog 的定义与第 15章append中出现的 ML 版本非常相似,并且以类似的方式使用了一种解析中的递归来生成新列表。在 Prolog 中,递归是由解析过程引起和控制的。与 ML 和 Haskell 一样, 模式匹配过程用于根据实际参数在两个不同的附加过程定义之间进行选择。
In appearance, the Prolog definition of append is very similar to the ML version that appears in Chapter 15, and a kind of recursion in resolution is used in a similar way to produce the new list. In the case of Prolog, the recursion is caused and controlled by the resolution process. As with ML and Haskell, a pattern-matching process is used to choose, based on the actual parameter, between two different definitions of the append process.
append以下代码中操作的前两个参数是要附加的两个列表,第三个参数是结果列表:
The first two parameters to the append operation in the following code are the two lists to be appended, and the third parameter is the resulting list:
append([], List, List).
append([Head | List_1], List_2, [Head | List_3]) :-
append(List_1, List_2, List_3).
append([], List, List).
append([Head | List_1], List_2, [Head | List_3]) :-
append(List_1, List_2, List_3).
第一个命题指定当空列表附加到任何其他列表时,结果就是该其他列表。此语句对应于 MLappend函数的递归终止步骤。请注意,终止命题放在递归命题之前。这样做是因为我们知道 Prolog 将按顺序匹配这两个命题,从第一个开始(因为它使用了深度优先顺序)。
The first proposition specifies that when the empty list is appended to any other list, that other list is the result. This statement corresponds to the recursion-terminating step of the ML append function. Note that the terminating proposition is placed before the recursion proposition. This is done because we know that Prolog will match the two propositions in order, starting with the first (because of its use of the depth-first order).
第二个命题指定了新列表的几个特征。它对应于 ML 函数中的递归步骤。左侧谓词指出新列表的第一个元素与第一个给定列表的第一个元素相同,因为它们都名为Head。每当Head实例化为一个值时,Head目标中的所有出现实际上都会同时实例化为该值。第二个语句的右侧指定第一个给定列表的尾部(List_1)将第二个给定列表附加到其上以形成结果列表的(List_2)尾部。(List_3)
The second proposition specifies several characteristics of the new list. It corresponds to the recursion step in the ML function. The left-side predicate states that the first element of the new list is the same as the first element of the first given list, because they are both named Head. Whenever Head is instantiated to a value, all occurrences of Head in the goal are, in effect, simultaneously instantiated to that value. The right side of the second statement specifies that the tail of the first given list (List_1) has the second given list (List_2) appended to it to form the tail (List_3) of the resulting list.
第二个语句的一种解读方式append如下:将列表 附加[Head | List_1]到任何列表List_2都会生成列表[Head | List_3],但前提是该列表List_3是通过附加List_1到形成的List_2。在 LISP 中,这将是
One way to read the second statement of append is as follows: Appending the list [Head | List_1] to any list List_2 produces the list [Head | List_3], but only if the list List_3 is formed by appending List_1 to List_2. In LISP, this would be
(CONS (CAR FIRST) (APPEND (CDR FIRST) SECOND))(CONS (CAR FIRST) (APPEND (CDR FIRST) SECOND))
在 Prolog 和 LISP 版本中,直到递归产生终止条件时才会构造结果列表;在这种情况下,第一个列表必须为空。然后,使用函数append本身构建结果列表;从第一个列表中获取的元素以相反的顺序添加到第二个列表中。反转是通过解开递归来完成的。
In both the Prolog and LISP versions, the resulting list is not constructed until the recursion produces the terminating condition; in this case, the first list must become empty. Then, the resulting list is built using the append function itself; the elements taken from the first list are added, in reverse order, to the second list. The reversing is done by the unraveling of the recursion.
Prologappend和 LISP 及 ML 的一个根本区别在于,Prologappend是一个谓词 — — 它不返回列表,而是返回yes或no。新列表是其第三个参数的值。
One fundamental difference between Prolog’s append and those of LISP and ML is that Prolog’s append is a predicate—it does not return a list, it returns yes or no. The new list is the value of its third parameter.
为了说明该append过程如何进展,请考虑以下跟踪示例:
To illustrate how the append process progresses, consider the following traced example:
trace.
append([bob, jo], [jake, darcie], Family).
(1) 1 Call: append([bob, jo], [jake, darcie], _10)?
(2) 2 Call: append([jo], [jake, darcie], _18)?
(3) 3 Call: append([], [jake, darcie], _25)?
(3) 3 Exit: append([], [jake, darcie], [jake, darcie])
(2) 2 Exit: append([jo], [jake, darcie], [jo, jake,
darcie])
(1) 1 Exit: append([bob, jo], [jake, darcie],
[bob, jo, jake, darcie])
Family = [bob, jo, jake, darcie]
yes
trace.
append([bob, jo], [jake, darcie], Family).
(1) 1 Call: append([bob, jo], [jake, darcie], _10)?
(2) 2 Call: append([jo], [jake, darcie], _18)?
(3) 3 Call: append([], [jake, darcie], _25)?
(3) 3 Exit: append([], [jake, darcie], [jake, darcie])
(2) 2 Exit: append([jo], [jake, darcie], [jo, jake,
darcie])
(1) 1 Exit: append([bob, jo], [jake, darcie],
[bob, jo, jake, darcie])
Family = [bob, jo, jake, darcie]
yes
前两个调用表示子目标,它们List_1非空,因此它们从第二个语句的右侧创建递归调用。第二个语句的左侧实际上指定了递归调用或目标的参数,从而每一步拆解第一个列表的一个元素。当第一个列表在调用或子目标中变为空时,第二个语句右侧的当前实例将通过匹配第一个语句而成功。这样做的效果是返回附加到第二个原始参数列表的空列表的值作为第三个参数。在代表成功匹配的连续退出时,从第一个列表中删除的元素将附加到结果列表。Family当退出第一个目标时,过程完成,并显示结果列表。
The first two calls, which represent subgoals, have List_1 nonempty, so they create the recursive calls from the right side of the second statement. The left side of the second statement effectively specifies the arguments for the recursive calls, or goals, thus dismantling the first list one element per step. When the first list becomes empty, in a call, or subgoal, the current instance of the right side of the second statement succeeds by matching the first statement. The effect of this is to return as the third parameter the value of the empty list appended to the second original parameter list. On successive exits, which represent successful matches, the elements that were removed from the first list are appended to the resulting list, Family. When the exit from the first goal is accomplished, the process is complete, and the resulting list is displayed.
Prologappend与 LISP 和 ML 的另一个区别是 Prologappend比这些语言更灵活。例如,在 Prolog 中,我们可以使用 来append确定两个列表可以附加在一起得到[a, b, c]什么
Another difference between Prolog’s append and those of LISP and ML is that Prolog’s append is more flexible than that of those languages. For example, in Prolog we can use append to determine what two lists can be appended to get [a, b, c] with
append(X, Y, [a, b, c]).append(X, Y, [a, b, c]).
结果如下:
This results in the following:
X = []
Y = [a, b, c]
X = []
Y = [a, b, c]
如果我们在此输出处输入分号,我们会得到替代结果:
If we type a semicolon at this output we get the alternative result:
X = [a]
Y = [b, c]
X = [a]
Y = [b, c]
继续下去,我们得到以下结果:
Continuing, we get the following:
X = [a, b]
Y = [c];
X = [a, b, c]
Y = []
X = [a, b]
Y = [c];
X = [a, b, c]
Y = []
谓词append还可用于创建其他列表操作,例如以下操作,其效果请读者自行判断。请注意, 的list_op_2使用方法是提供一个列表作为其第一个参数,提供一个变量作为其第二个参数, 的结果list_op_2是第二个参数实例化的值。
The append predicate can also be used to create other list operations, such as the following, whose effect we invite the reader to determine. Note that list_op_2 is meant to be used by providing a list as its first parameter and a variable as its second, and the result of list_op_2 is the value to which the second parameter is instantiated.
list_op_2([], []).
list_op_2([Head | Tail], List) :-
list_op_2(Tail, Result), append(Result, [Head], List).
list_op_2([], []).
list_op_2([Head | Tail], List) :-
list_op_2(Tail, Result), append(Result, [Head], List).
您可能已经确定,list_op_2会导致 Prolog 系统使用包含第一个参数列表元素的列表来实例化其第二个参数,但顺序相反。例如,使用列表([apple, orange, grape], Q)进行实例化。Q[grape, orange, apple]
As you may have been able to determine, list_op_2 causes the Prolog system to instantiate its second parameter with a list that has the elements of the list of the first parameter, but in reverse order. For example, ([apple, orange, grape], Q) instantiates Q with the list [grape, orange, apple].
再次强调,尽管 LISP 和 Prolog 语言本质上是不同的,但类似的操作可以使用类似的方法。在反向操作的情况下,Prologlist_op_2和 LISP 的函数都包括递归终止条件,以及将列表或尾部的反转附加到列表或头部以创建结果列表reverse的基本过程。CDRCAR
Once again, although the LISP and Prolog languages are fundamentally different, similar operations can use similar approaches. In the case of the reverse operation, both the Prolog’s list_op_2 and LISP’s reverse function include the recursion-terminating condition, along with the basic process of appending the reversal of the CDR or tail of the list to the CAR or head of the list to create the result list.
以下是此过程的踪迹,现命名为reverse:
The following is a trace of this process, now named reverse:
trace.
reverse([a, b, c], Q).
(1) 1 Call: reverse([a, b, c], _6)?
(2) 2 Call: reverse([b, c], _65636)?
(3) 3 Call: reverse([c], _65646)?
(4) 4 Call: reverse([], _65656)?
(4) 4 Exit: reverse([], [])
(5) 4 Call: append([], [c], _65646)?
(5) 4 Exit: append([], [c], [c])
(3) 3 Exit: reverse([c], [c])
(6) 3 Call: append([c], [b], _65636)?
(7) 4 Call: append([], [b], _25)?
(7) 4 Exit: append([], [b], [b])
(6) 3 Exit: append([c], [b], [c, b])
(2) 2 Exit: reverse([b, c], [c, b])
(8) 2 Call: append([c, b], [a], _6)?
(9) 3 Call: append([b], [a], _32)?
(10) 4 Call: append([], [a], _39)?
(10) 4 Exit: append([], [a], [a])
(9) 3 Exit: append([b], [a], [b, a])
(8) 2 Exit: append([c, b], [a], [c, b, a])
(1) 1 Exit: reverse([a, b, c], [c, b, a])
Q = [c, b, a]
trace.
reverse([a, b, c], Q).
(1) 1 Call: reverse([a, b, c], _6)?
(2) 2 Call: reverse([b, c], _65636)?
(3) 3 Call: reverse([c], _65646)?
(4) 4 Call: reverse([], _65656)?
(4) 4 Exit: reverse([], [])
(5) 4 Call: append([], [c], _65646)?
(5) 4 Exit: append([], [c], [c])
(3) 3 Exit: reverse([c], [c])
(6) 3 Call: append([c], [b], _65636)?
(7) 4 Call: append([], [b], _25)?
(7) 4 Exit: append([], [b], [b])
(6) 3 Exit: append([c], [b], [c, b])
(2) 2 Exit: reverse([b, c], [c, b])
(8) 2 Call: append([c, b], [a], _6)?
(9) 3 Call: append([b], [a], _32)?
(10) 4 Call: append([], [a], _39)?
(10) 4 Exit: append([], [a], [a])
(9) 3 Exit: append([b], [a], [b, a])
(8) 2 Exit: append([c, b], [a], [c, b, a])
(1) 1 Exit: reverse([a, b, c], [c, b, a])
Q = [c, b, a]
假设我们需要能够确定给定符号是否在给定列表中。对此的直接 Prolog 描述是
Suppose we need to be able to determine whether a given symbol is in a given list. A straightforward Prolog description of this is
member(Element, [Element | _]).
member(Element, [_ | List]) :- member(Element, List).
member(Element, [Element | _]).
member(Element, [_ | List]) :- member(Element, List).
下划线表示“匿名”变量;它用于表示我们不关心它可能从统一中获得什么实例。如果是列表的头部,则上例中的第一个语句成功,无论是最初还是在第二个语句的几次递归之后。如果位于列表的尾部,则Element第二个语句成功。考虑以下跟踪示例:Element
The underscore indicates an “anonymous” variable; it is used to mean that we do not care what instantiation it might get from unification. The first statement in the previous example succeeds if Element is the head of the list, either initially or after several recursions through the second statement. The second statement succeeds if Element is in the tail of the list. Consider the following traced examples:
trace.
member(a, [b, c, d]).
(1) 1 Call: member(a, [b, c, d])?
(2) 2 Call: member(a, [c, d])?
(3) 3 Call: member(a, [d])?
(4) 4 Call: member(a, [])?
(4) 4 Fail: member(a, [])
(3) 3 Fail: member(a, [d])
(2) 2 Fail: member(a, [c, d])
(1) 1 Fail: member(a, [b, c, d])
no
member(a, [b, a, c]).
(1) 1 Call: member(a, [b, a, c])?
(2) 2 Call: member(a, [a, c])?
(2) 2 Exit: member(a, [a, c])
(1) 1 Exit: member(a, [b, a, c])
yes
trace.
member(a, [b, c, d]).
(1) 1 Call: member(a, [b, c, d])?
(2) 2 Call: member(a, [c, d])?
(3) 3 Call: member(a, [d])?
(4) 4 Call: member(a, [])?
(4) 4 Fail: member(a, [])
(3) 3 Fail: member(a, [d])
(2) 2 Fail: member(a, [c, d])
(1) 1 Fail: member(a, [b, c, d])
no
member(a, [b, a, c]).
(1) 1 Call: member(a, [b, a, c])?
(2) 2 Call: member(a, [a, c])?
(2) 2 Exit: member(a, [a, c])
(1) 1 Exit: member(a, [b, a, c])
yesProlog 虽然是一个有用的工具,但它既不是纯粹的逻辑编程语言,也不是完美的逻辑编程语言。本节介绍 Prolog 的一些问题。
Although Prolog is a useful tool, it is neither a pure nor a perfect logic programming language. This section describes some of the problems with Prolog.
出于效率原因,Prolog 允许用户在解析期间控制模式匹配的顺序。在纯逻辑编程环境中,解析期间尝试匹配的顺序是不确定的,并且所有匹配都可以同时尝试。但是,由于 Prolog 总是以相同的顺序匹配,从数据库的开头开始,然后从给定目标的左端开始,因此用户可以通过对数据库语句进行排序来优化特定应用程序,从而极大地影响效率。例如,如果用户知道在特定“执行”期间某些规则比其他规则更有可能成功,那么可以通过将这些规则放在数据库的开头来提高程序的效率。
Prolog, for reasons of efficiency, allows the user to control the ordering of pattern matching during resolution. In a pure logic programming environment, the order of attempted matches that take place during resolution is nondeterministic, and all matches could be attempted concurrently. However, because Prolog always matches in the same order, starting at the beginning of the database and at the left end of a given goal, the user can profoundly affect efficiency by ordering the database statements to optimize a particular application. For example, if the user knows that certain rules are much more likely to succeed than the others during a particular “execution,” then the program can be made more efficient by placing those rules at the beginning of the database.
除了允许用户控制数据库和子目标的排序之外,Prolog 还对效率做出了另一个让步,即允许对回溯进行一些显式控制。这是通过切分运算符来实现的,该运算符由感叹号 ( !) 指定。切分运算符实际上是一个目标,而不是运算符。作为目标,它总是立即成功,但不能通过回溯重新满足。因此,切分运算符的一个副作用是复合目标中其左侧的子目标也不能通过回溯重新满足。例如,在目标
In addition to allowing the user to control database and subgoal ordering, Prolog, in another concession to efficiency, allows some explicit control of backtracking. This is done with the cut operator, which is specified by an exclamation point (!). The cut operator is actually a goal, not an operator. As a goal, it always succeeds immediately, but it cannot be resatisfied through backtracking. Thus, a side effect of the cut is that subgoals to its left in a compound goal also cannot be resatisfied through backtracking. For example, in the goal
a, b, !, c, d.a, b, !, c, d.
如果 和 都a成功b但c失败,则整个目标失败。如果知道无论何时失败,重新满足或 都是c浪费时间,则将使用此目标。ba
if both a and b succeed but c fails, the whole goal fails. This goal would be used if it were known that whenever c fails, it is a waste of time to resatisfy b or a.
那么,切割的目的是为了让用户能够通过告诉系统何时不应尝试重新满足可能无法产生完整证明的子目标来使程序更高效。
The purpose of the cut then is to allow the user to make programs more efficient by telling the system when it should not attempt to resatisfy subgoals that presumably could not result in a complete proof.
作为使用 cut 运算符的一个例子,请考虑第 16.6.7节member中的规则,它们是:
As an example of the use of the cut operator, consider the member rules from Section 16.6.7, which are:
member(Element, [Element | _]).
member(Element, [_ | List]) :- member(Element, List).
member(Element, [Element | _]).
member(Element, [_ | List]) :- member(Element, List).
如果 的列表参数member表示一个集合,那么它只能满足一次(集合不包含重复元素)。因此,如果member用作多个子目标目标陈述中的子目标,则可能会出现问题。问题是,如果member成功但下一个子目标失败,回溯将尝试member通过继续之前的匹配来重新满足。但是因为 的列表参数一member开始只有一个元素的副本,所以member不可能再次成功,最终导致整个目标失败,尽管有任何额外的尝试来重新满足member。例如,考虑目标:
If the list argument to member represents a set, then it can be satisfied only once (sets contain no duplicate elements). Therefore, if member is used as a subgoal in a multiple subgoal goal statement, there can be a problem. The problem is that if member succeeds but the next subgoal fails, backtracking will attempt to resatisfy member by continuing a prior match. But because the list argument to member has only one copy of the element to begin with, member cannot possibly succeed again, which eventually causes the whole goal to fail, in spite of any additional attempts to resatisfy member. For example, consider the goal:
dem_candidate(X) :- member(X, democrats), tests(X).dem_candidate(X) :- member(X, democrats), tests(X).
此目标确定某个人是否是民主党人,是否是竞选某个职位的合适人选。tests子目标检查给定民主党人的各种特征,以确定该人是否适合该职位。如果民主党人集合没有重复,那么我们不想在子目标失败member时返回到子目标tests,因为member将搜索所有其他民主党人但会失败,因为没有重复。member子目标的第二次尝试将浪费计算时间。解决这种低效率的方法是将定义的第一个语句的右侧添加一个member,将切割运算符作为唯一元素,如下所示
This goal determines whether a given person is a democrat and is a good candidate to run for a particular position. The tests subgoal checks a variety of characteristics of the given democrat to determine the suitability of the person for the position. If the set of democrats has no duplicates, then we do not want to back up to the member subgoal if the tests subgoal fails, because member will search all of the other democrats but fail, because there are no duplicates. The second attempt of member subgoal will be a waste of computation time. The solution to this inefficiency is to add a right side to the first statement of the member definition, with the cut operator as the sole element, as in
member(Element, [Element | _]) :- !.member(Element, [Element | _]) :- !.
回溯不会尝试重新满足member,而是会导致整个子目标失败。
Backtracking will not attempt to resatisfy member but instead will cause the entire subgoal to fail.
在 Prolog 中,一种名为生成和测试的编程策略中,Cut 特别有用。在使用生成和测试策略的程序中,目标由生成潜在解决方案的子目标组成,这些子目标随后由后面的“测试”子目标检查。被拒绝的解决方案需要回溯到“生成器”子目标,从而生成新的潜在解决方案。作为生成和测试程序的一个示例,请考虑以下内容,它出现在Clocksin 和 Mellish (2013)中:
Cut is particularly useful in a programming strategy in Prolog called generate and test. In programs that use the generate-and-test strategy, the goal consists of subgoals that generate potential solutions, which are then checked by later “test” subgoals. Rejected solutions require backtracking to “generator” subgoals, which generate new potential solutions. As an example of a generate-and-test program, consider the following, which appears in Clocksin and Mellish (2013):
divide(N1, N2, Result) :- is_integer(Result),
Product1 is Result * N2,
Product2 is (Result + 1) * N2,
Product1 =< N1, Product2 >
N1, !.divide(N1, N2, Result) :- is_integer(Result),
Product1 is Result * N2,
Product2 is (Result + 1) * N2,
Product1 =< N1, Product2 >
N1, !.
此程序使用加法和乘法执行整数除法。由于大多数 Prolog 系统都提供除法作为运算符,因此此程序实际上并没有什么用处,只能用来说明一个简单的生成和测试程序。
This program performs integer division, using addition and multiplication. Because most Prolog systems provide division as an operator, this program actually is not useful, other than to illustrate a simple generate-and-test program.
is_integer只要谓词的参数可以实例化为某个非负整数,谓词就会成功。如果其参数未实例化,is_integer则将其实例化为值 0。如果参数实例化为整数,is_integer则将其实例化为下一个更大的整数值。
The predicate is_integer succeeds as long as its parameter can be instantiated to some nonnegative integer. If its argument is not instantiated, is_integer instantiates it to the value 0. If the argument is instantiated to an integer, is_integer instantiates it to the next larger integer value.
因此,在 中divide,is_integer是生成器子目标。每次满足 时,它都会生成一个序列元素 0、1、2、...。所有其他子目标都是测试子目标——它们检查以确定 产生的值是否is_integer实际上是前两个参数N1和的商N2。作为最后一个子目标的切割目的很简单:divide一旦找到解决方案,它就不会再尝试寻找替代解决方案。虽然is_integer可以生成大量候选方案,但只有一个是解决方案,因此此处的切割可以防止产生次要解决方案的无用尝试。
So, in divide, is_integer is the generator subgoal. It generates elements of the sequence 0, 1, 2, . . . , one each time it is satisfied. All of the other subgoals are the testing subgoals—they check to determine whether the value produced by is_integer is, in fact, the quotient of the first two parameters, N1 and N2. The purpose of the cut as the last subgoal is simple: It prevents divide from ever trying to find an alternative solution once it has found the solution. Although is_integer can generate a huge number of candidates, only one is the solution, so the cut here prevents useless attempts to produce secondary solutions.
cut 运算符的使用与命令式语言中的 goto 的使用类似 ( van Emden, 1980 )。虽然有时需要它,但它也可能被滥用。事实上,它有时被用来使逻辑程序具有受命令式编程风格启发的控制流。
Use of the cut operator has been compared to the use of the goto in imperative languages (van Emden, 1980). Although it is sometimes needed, it is possible to abuse it. Indeed, it is sometimes used to make logic programs have a control flow that is inspired by imperative programming styles.
Prolog 程序中篡改控制流的能力是一种缺陷,因为它直接损害了逻辑编程的一个重要优势——程序不指定如何找到解决方案。相反,它们只是指定解决方案应该是什么样子。这种设计使程序更容易编写和阅读。它们不会被如何确定解决方案的细节所困扰,特别是不会因为计算以产生解决方案的精确顺序而变得杂乱无章。因此,虽然逻辑编程不需要控制流方向,但 Prolog 程序经常使用它们,主要是为了提高效率。
The ability to tamper with control flow in a Prolog program is a deficiency, because it is directly detrimental to one of the important advantages of logic programming—that programs do not specify how solutions are to be found. Rather, they simply specify what the solution should look like. This design makes programs easier to write and easier to read. They are not cluttered with the details of how the solutions are to be determined and, in particular, the precise order in which the computations are done to produce the solution. So, while logic programming requires no control flow directions, Prolog programs frequently use them, mostly for the sake of efficiency.
Prolog 解析的性质有时会产生误导性的结果。就 Prolog 而言,唯一的真理是那些可以使用其数据库证明的真理。除了数据库之外,它对世界一无所知。当系统收到查询并且数据库没有信息来绝对证明查询时,该查询被视为错误。Prolog 可以证明给定的目标是真实的,但无法证明给定的目标是错误的。它只是假设,因为它无法证明目标为真,所以目标一定是假的。本质上,Prolog 是一个真/失败系统,而不是真/假系统。
The nature of Prolog’s resolution sometimes creates misleading results. The only truths, as far as Prolog is concerned, are those that can be proved using its database. It has no knowledge of the world other than its database. When the system receives a query and the database does not have information to prove the query absolutely, the query is assumed to be false. Prolog can prove that a given goal is true, but it cannot prove that a given goal is false. It simply assumes that, because it cannot prove a goal true, the goal must be false. In essence, Prolog is a true/fail system, rather than a true/false system.
事实上,封闭世界假设对你来说一点也不陌生——我们的司法系统也是这样运作的。嫌疑人在被证明有罪之前都是无辜的。他们不需要被证明无罪。如果审判不能证明某人有罪,他或她就被认为是无罪的。
Actually, the closed-world assumption should not be at all foreign to you—our judicial system operates the same way. Suspects are innocent until proven guilty. They need not be proven innocent. If a trial cannot prove a person guilty, he or she is considered innocent.
封闭世界假设的问题与否定问题相关,这将在下一小节中讨论。
The problem of the closed-world assumption is related to the negation problem, which is discussed in the following subsection.
Prolog 的另一个问题是它难以进行否定。考虑以下包含两个事实和一个关系的数据库:
Another problem with Prolog is its difficulty with negation. Consider the following database of two facts and a relationship:
parent(bill, jake).
parent(bill, shelley).
sibling(X, Y) :- (parent(M, X), parent(M, Y).
parent(bill, jake).
parent(bill, shelley).
sibling(X, Y) :- (parent(M, X), parent(M, Y).
现在,假设我们输入查询
Now, suppose we typed the query
sibling(X, Y).sibling(X, Y).
Prolog 将响应
Prolog will respond with
X = jake
Y = jake
X = jake
Y = jake
因此,Prolog“认为”jake是sibling自己的。发生这种情况的原因是系统首先实例化M和bill,X以jake使第一个子目标parent(M, X)为真。然后它再次从数据库的开头开始匹配第二个子目标,并到达parent(M, Y)实例化和M。由于两个子目标是独立满足的,两个匹配都从数据库的开头开始,因此会出现所示的响应。为了避免这种结果,必须指定为只有当它们具有相同和时才为的billYjakeXsiblingYparents 。不幸的是,在Prolog中声明它们不相等并不是一件容易的事,我们将讨论这一点。最严格的方法是为每对原子添加一个事实,说明它们不相同。当然,这会导致数据库变得非常大,因为负面信息往往比事实多得多正面信息。例如,大多数人的过生日的天数比过生日的天数多 364 天。
Thus, Prolog “thinks” jake is a sibling of himself. This happens because the system first instantiates M with bill and X with jake to make the first subgoal, parent(M, X), true. It then starts at the beginning of the database again to match the second subgoal, parent(M, Y), and arrives at the instantiations of M with bill and Y with jake. Because the two subgoals are satisfied independently, with both matchings starting at the database’s beginning, the shown response appears. To avoid this result, X must be specified to be a sibling of Y only if they have the same parents and they are not the same. Unfortunately, stating that they are not equal is not straightforward in Prolog, as we will discuss. The most exacting method would require adding a fact for every pair of atoms, stating that they were not the same. This can, of course, cause the database to become very large, for there is often far more negative information than positive information. For example, most people have 364 more unbirthdays than they have birthdays.
一个简单的替代解决方案是在目标中声明X不能与相同Y,例如
A simple alternative solution is to state in the goal that X must not be the same as Y, as in
sibling(X, Y) :- parent(M, X), parent(M, Y), not(X = Y).sibling(X, Y) :- parent(M, X), parent(M, Y), not(X = Y).
在其他情况下,解决方案并不那么简单。
In other situations, the solution is not so simple.
not在这种情况下,如果解析不能满足子目标,则 Prolog 运算符得到满足。X = Y因此,如果not成功,并不一定意味着X不等于Y;相反,这意味着解析不能从数据库中证明X与相同Y。因此,Prolognot运算符不等同于逻辑 NOT 运算符,其中 NOT 表示其操作数可能为真。如果我们碰巧有一个形式为的目标,这种不等价性可能会导致问题
The Prolog not operator is satisfied in this case if resolution cannot satisfy the subgoal X = Y. Therefore, if the not succeeds, it does not necessarily mean that X is not equal to Y; rather, it means that resolution cannot prove from the database that X is the same as Y. Thus, the Prolog not operator is not equivalent to a logical NOT operator, in which NOT means that its operand is probably true. This nonequivalency can lead to a problem if we happen to have a goal of the form
not(not(some_goal)).not(not(some_goal)).
这相当于
which would be equivalent to
some_goal.some_goal.
如果 Prolog 的not运算符是真正的逻辑非运算符。然而,在某些情况下,它们并不相同。例如,再次考虑规则member:
if Prolog’s not operator were a true logical NOT operator. In some cases, however, they are not the same. For example, consider again the member rules:
member(Element, [Element | _]) :- !.
member(Element, [_ | List]) :- member(Element, List).
member(Element, [Element | _]) :- !.
member(Element, [_ | List]) :- member(Element, List).
为了发现给定列表中的某个元素,我们可以使用目标
To discover one of the elements of a given list, we could use the goal
member(X, [mary, fred, barb]).member(X, [mary, fred, barb]).
这将导致X被实例化mary,然后会被打印。但如果我们使用
which would cause X to be instantiated with mary, which would then be printed. But if we used
not(not(member(X, [mary, fred, barb]))).not(not(member(X, [mary, fred, barb]))).
以下事件顺序将会发生:首先,内部目标将成功,实例化为X。mary然后,Prolog 将尝试满足下一个目标:
the following sequence of events would take place: First, the inner goal would succeed, instantiating X to mary. Then, Prolog would attempt to satisfy the next goal:
not(member(X, [mary, fred, barb])).not(member(X, [mary, fred, barb])).
此语句会因为member成功而失败。当此目标失败时,X将不会被实例化,因为 Prolog 始终会取消实例化所有失败目标中的所有变量。接下来,Prolog 将尝试满足外部not目标,这将成功,因为它的参数失败了。最后,结果,即X,将被打印出来。但X目前不会被实例化,所以系统会指出这一点。通常,未实例化的变量以下划线开头的数字串的形式打印出来。因此,Prolog 的not不等同于逻辑 NOT 的事实至少可能会产生误导。
This statement would fail because member succeeded. When this goal failed, X would be uninstantiated, because Prolog always uninstantiates all variables in all goals that fail. Next, Prolog would attempt to satisfy the outer not goal, which would succeed, because its argument had failed. Finally, the result, which is X, would be printed. But X would not be currently instantiated, so the system would indicate that. Generally, uninstantiated variables are printed in the form of a string of digits preceded by an underscore. So, the fact that Prolog’s not is not equivalent to a logical NOT can be, at the very least, misleading.
逻辑“非”不能成为Prolog不可分割的一部分的根本原因是Horn子句的形式:
The fundamental reason why logical NOT cannot be an integral part of Prolog is the form of the Horn clause:
A :- B1
B2
...
Bn
A :- B1
B2
. . .
Bn
如果所有B命题都是真的,那么可以得出 是A真的。但无论 中的任何一个或所有 是真还是假B,都不能得出A是假的结论。从正逻辑中,只能得出正逻辑的结论。因此,使用霍恩子句形式可以避免任何否定结论。
If all the B propositions are true, it can be concluded that A is true. But regardless of the truth or falseness of any or all of the B’s, it cannot be concluded that A is false. From positive logic, one can conclude only positive logic. Thus, the use of Horn clause form prevents any negative conclusions.
如第16.4节 所述,逻辑编程的一个基本目标是提供非过程化编程;也就是说,程序员可以通过一个系统指定程序应该做什么,但不需要指定如何完成。此处给出的排序示例重写如下:
A fundamental goal of logic programming, as stated in Section 16.4, is to provide nonprocedural programming; that is, a system by which programmers specify what a program is supposed to do but need not specify how that is to be accomplished. The example given there for sorting is rewritten here:
(对旧列表、新列表进行排序)
排列(旧列表、新列表)
已排序(新列表)
sort(old_list, new_list)
permute(old_list, new_list)
sorted (new_list)
排序(列表)
使得
列表(j)
sorted(list)
such that
list(j)
在 Prolog 中写这个很简单。例如,排序后的子目标可以表示为
It is straightforward to write this in Prolog. For example, the sorted subgoal can be expressed as
sorted ([]).
sorted ([x]).
sorted ([x, y | list]) :- x <= y, sorted ([y | list]).
sorted ([]).
sorted ([x]).
sorted ([x, y | list]) :- x <= y, sorted ([y | list]).
这个排序过程的问题在于它不知道如何排序,只能简单地枚举给定列表的所有排列,直到碰巧创建一个按排序顺序排列的列表——这确实是一个非常缓慢的过程。
The problem with this sort process is that it has no idea of how to sort, other than simply to enumerate all permutations of the given list until it happens to create the one that has the list in sorted order—a very slow process, indeed.
到目前为止,还没有人发现可以将排序列表的描述转换为某种有效的排序算法的过程。解析可以做很多有趣的事情,但肯定不是这个。因此,对列表进行排序的 Prolog 程序必须指定如何进行排序的细节,就像在命令式或函数式语言中一样。
So far, no one has discovered a process by which the description of a sorted list can be transformed into some efficient algorithm for sorting. Resolution is capable of many interesting things, but certainly not this. Therefore, a Prolog program that sorts a list must specify the details of how that sorting can be done, as is the case in an imperative or functional language.
所有这些问题是否意味着应该放弃逻辑编程?绝对不是!事实上,它能够处理许多有用的应用程序。此外,它基于一个有趣的概念,因此本身就很有趣。最后,有可能开发出新的推理技术,使逻辑编程语言系统能够有效地处理越来越大的问题类别。
Do all of these problems mean that logic programming should be abandoned? Absolutely not! As it is, it is capable of dealing with many useful applications. Furthermore, it is based on an intriguing concept and is therefore interesting in and of itself. Finally, there is the possibility that new inferencing techniques will be developed that will allow a logic programming language system to efficiently deal with progressively larger classes of problems.
在本节中,我们简要介绍逻辑编程(尤其是 Prolog)的当前和潜在应用的一些较大类别。
In this section, we briefly describe a few of the larger classes of present and potential applications of logic programming in general and Prolog in particular.
关系数据库管理系统 (RDBMS) 以表的形式存储数据。此类数据库上的查询通常以结构化查询语言 (SQL) 来陈述。SQL 是非过程化的,就像逻辑编程是非过程化的一样。用户不描述如何检索答案;相反,他或她只描述答案的特征。逻辑编程和 RDBMS 之间的联系应该是显而易见的。简单的信息表可以用 Prolog 结构来描述,而表之间的关系可以用 Prolog 规则方便而轻松地描述。检索过程是解析操作所固有的。Prolog 的目标语句为 RDBMS 提供查询。因此,逻辑编程与实现 RDBMS 的需求自然匹配。
Relational database management systems (RDBMSs) store data in the form of tables. Queries on such databases are often stated in Structured Query Language (SQL). SQL is nonprocedural in the same sense that logic programming is nonprocedural. The user does not describe how to retrieve the answer; rather, he or she describes only the characteristics of the answer. The connection between logic programming and RDBMSs should be obvious. Simple tables of information can be described by Prolog structures, and relationships between tables can be conveniently and easily described by Prolog rules. The retrieval process is inherent in the resolution operation. The goal statements of Prolog provide the queries for the RDBMS. Logic programming is thus a natural match to the needs of implementing an RDBMS.
使用逻辑编程实现 RDBMS 的优势之一是只需要一种语言。在典型的 RDBMS 中,数据库语言包括数据定义、数据操作和查询语句,所有这些语句都嵌入在通用编程语言(例如 COBOL)中。通用语言用于处理数据和输入输出功能。所有这些功能都可以用逻辑编程语言完成。
One of the advantages of using logic programming to implement an RDBMS is that only a single language is required. In a typical RDBMS, a database language includes statements for data definitions, data manipulation, and queries, all of which are embedded in a general-purpose programming language, such as COBOL. The general-purpose language is used for processing the data and input and output functions. All of these functions can be done in a logic programming language.
使用逻辑编程实现 RDBMS 的另一个优点是内置了推理能力。传统的 RDBMS 无法从数据库中推断出除明确存储在其中的内容之外的任何内容。它们只包含事实,而不是事实和推理规则。与传统的 RDBMS 相比,使用逻辑编程实现 RDBMS 的主要缺点是逻辑编程实现速度较慢。逻辑推理比使用命令式编程技术的普通表查找方法花费的时间更长。
Another advantage of using logic programming to implement an RDBMS is that deductive capability is built in. Conventional RDBMSs cannot deduce anything from a database other than what is explicitly stored in them. They contain only facts, rather than facts and inference rules. The primary disadvantage of using logic programming for an RDBMS, compared with a conventional RDBMS, is that the logic programming implementation is slower. Logical inferences take longer than ordinary table look-up methods using imperative programming techniques.
专家系统是旨在模拟人类在特定领域的专业知识的计算机系统。它们由事实数据库、推理过程、有关该领域的一些启发式方法以及一些友好的人机界面组成,这些界面使系统看起来更像专家级的人类顾问。除了由人类专家提供的初始知识库之外,专家系统还从使用过程中学习,因此其数据库必须能够动态增长。此外,专家系统还应包括在确定需要此类信息时询问用户以获取更多信息的能力。
Expert systems are computer systems designed to emulate human expertise in some particular domain. They consist of a database of facts, an inferencing process, some heuristics about the domain, and some friendly human interface that makes the system appear much like an expert human consultant. In addition to their initial knowledge base, which is provided by a human expert, expert systems learn from the process of being used, so their databases must be capable of growing dynamically. Also, an expert system should include the capability of interrogating the user to get additional information when it determines that such information is needed.
专家系统设计者面临的一个核心问题是处理数据库不可避免的不一致和不完整性。逻辑编程似乎非常适合处理这些问题。例如,默认推理规则可以帮助处理不完整性问题。
One of the central problems for the designer of an expert system is dealing with the inevitable inconsistencies and incompleteness of the database. Logic programming appears to be well suited to deal with these problems. For example, default inference rules can help deal with the problem of incompleteness.
Prolog 可以并且已经被用来构建专家系统。它可以轻松满足专家系统的基本需求,使用解析作为查询处理的基础,使用其添加事实和规则的能力来提供学习能力,并使用其跟踪功能来告知用户给定结果背后的“推理”。Prolog 缺少的是系统在需要时自动向用户查询更多信息的能力。
Prolog can and has been used to construct expert systems. It can easily fulfill the basic needs of expert systems, using resolution as the basis for query processing, using its ability to add facts and rules to provide the learning capability, and using its trace facility to inform the user of the “reasoning” behind a given result. Missing from Prolog is the automatic ability of the system to query the user for additional information when it is needed.
专家系统中逻辑编程最广为人知的用途之一是专家系统构建系统 APES,该系统由Sergot (1983)和Hammond (1983)描述。APES 系统包括一个非常灵活的工具,用于在专家系统构建期间从用户那里收集信息。它还包括第二个解释器,用于对查询的答案进行解释。
One of the most widely known uses of logic programming in expert systems is the expert system construction system known as APES, which is described in Sergot (1983) and Hammond (1983). The APES system includes a very flexible facility for gathering information from the user during expert system construction. It also includes a second interpreter for producing explanations to its answers to queries.
APES 已成功用于开发多个专家系统,包括一个用于政府社会福利计划规则的系统和一个用于英国国籍法的系统,后者是英国公民规则的权威来源。
APES has been successfully used to produce several expert systems, including one for the rules of a government social benefits program and one for the British Nationality Act, which is the definitive source for rules of British citizenship.
某些类型的自然语言处理可以通过逻辑编程来完成。特别是,可以使用逻辑编程方便地实现与计算机软件系统(如智能数据库和其他基于智能知识的系统)的自然语言接口。为了描述语言语法,已发现逻辑编程的形式等同于上下文无关语法。已发现逻辑编程系统中的证明过程等同于某些解析策略。事实上,后向链接解析可直接用于解析结构由上下文无关语法描述的句子。还发现,通过使用逻辑编程对语言进行建模,可以明确某些类型的自然语言语义。特别是,基于逻辑的语义网络的研究表明,自然语言中的句子集可以用小句形式来表示(Deliyanni 和 Kowalski,1979 年)。Kowalski (1979 年)还讨论了基于逻辑的语义网络。
Certain kinds of natural-language processing can be done with logic programming. In particular, natural-language interfaces to computer software systems, such as intelligent databases and other intelligent knowledge-based systems, can be conveniently done with logic programming. For describing language syntax, forms of logic programming have been found to be equivalent to context-free grammars. Proof procedures in logic programming systems have been found to be equivalent to certain parsing strategies. In fact, backward-chaining resolution can be used directly to parse sentences whose structures are described by context-free grammars. It has also been discovered that some kinds of semantics of natural languages can be made clear by modeling the languages with logic programming. In particular, research in logic-based semantics networks has shown that sets of sentences in natural languages can be expressed in clausal form (Deliyanni and Kowalski, 1979). Kowalski (1979) also discusses logic-based semantic networks.
符号逻辑为逻辑编程和逻辑编程语言提供了基础。逻辑编程的方法是使用一组事实和规则作为数据库,这些事实和规则陈述了事实之间的关系,并使用自动推理过程来检查新命题的有效性,假设数据库的事实和规则是正确的。这种方法是为自动定理证明而开发的。
Symbolic logic provides the basis for logic programming and logic programming languages. The approach of logic programming is to use as a database a collection of facts and rules that state relationships between facts and to use an automatic inferencing process to check the validity of new propositions, assuming the facts and rules of the database are true. This approach is the one developed for automatic theorem proving.
Prolog 是最广泛使用的逻辑编程语言。逻辑编程的起源在于 Robinson 开发的逻辑推理解析规则。Prolog 主要由马赛的 Colmeraeur 和 Roussel 开发,并得到了爱丁堡的 Kowalski 的一些帮助。
Prolog is the most widely used logic programming language. The origins of logic programming lie in Robinson’s development of the resolution rule for logical inference. Prolog was developed primarily by Colmeraeur and Roussel at Marseille, with some help from Kowalski at Edinburgh.
逻辑程序是非程序性的,这意味着解决方案的特征是给定的,但获取解决方案的过程却不是。
Logic programs are nonprocedural, which means that the characteristics of the solution are given but the process of getting the solution is not.
Prolog 语句是事实、规则或目标。大多数由结构(原子命题)和逻辑运算符组成,但也允许使用算术表达式。
Prolog statements are facts, rules, or goals. Most are made up of structures, which are atomic propositions, and logic operators, although arithmetic expressions are also allowed.
解析是 Prolog 解释器的主要活动。此过程广泛使用回溯,主要涉及命题之间的模式匹配。当涉及变量时,可以将它们实例化为值以提供匹配。此实例化过程称为统一。
Resolution is the primary activity of a Prolog interpreter. This process, which uses backtracking extensively, involves mainly pattern matching among propositions. When variables are involved, they can be instantiated to values to provide matches. This instantiation process is called unification.
逻辑编程的现状存在许多问题。出于效率原因,甚至为了避免无限循环,程序员有时必须在程序中声明控制流信息。此外,还存在封闭世界假设和否定的问题。
There are a number of problems with the current state of logic programming. For reasons of efficiency, and even to avoid infinite loops, programmers must sometimes state control flow information in their programs. Also, there are the problems of the closed-world assumption and negation.
逻辑编程已用于许多不同的领域,主要用于关系数据库系统、专家系统和自然语言处理。
Logic programming has been used in a number of different areas, primarily in relational database systems, expert systems, and natural-language processing.
Prolog 语言已在多本书中介绍。Clocksin 和 Mellish (2003) 介绍了 Edinburgh 版本的语言。Clark和 McCabe (1984)介绍了微机实现。
The Prolog language is described in several books. Edinburgh’s form of the language is covered in Clocksin and Mellish (2003). The microcomputer implementation is described in Clark and McCabe (1984).
Hogger (1991)是一本关于逻辑编程的优秀书籍。它是本章逻辑编程应用部分材料的来源。
Hogger (1991) is an excellent book on the general topic of logic programming. It is the source of the material in this chapter’s section on logic programming applications.
符号逻辑在形式逻辑中有哪三个主要用途?
What are the three primary uses of symbolic logic in formal logic?
复合词由哪两个部分组成?
What are the two parts of a compound term?
陈述一个命题有哪两种模式?
What are the two modes in which a proposition can be stated?
从句形式命题的一般形式是什么?
What is the general form of a proposition in clausal form?
什么是前因?什么是后果?
What are antecedents? Consequences?
给出分解和统一的一般(非严格)定义。
Give general (not rigorous) definitions of resolution and unification.
霍恩条款有哪些形式?
What are the forms of Horn clauses?
声明语义的基本概念是什么?
What is the basic concept of declarative semantics?
一种语言是非程序性的,这是什么意思呢?
What does it mean for a language to be nonprocedural?
Prolog 术语有哪三种形式?
What are the three forms of a Prolog term?
什么是未实例化的变量?
What is an uninstantiated variable?
Prolog中事实和规则陈述的句法形式和用法是什么?
What are the syntactic forms and usage of fact and rule statements in Prolog?
什么是连词?
What is a conjunction?
解释将目标与数据库中的事实进行匹配的两种方法。
Explain the two approaches to matching goals to facts in a database.
在讨论如何满足多个目标时,解释深度优先搜索和广度优先搜索之间的区别。
Explain the difference between a depth-first and a breadth-first search when discussing how multiple goals are satisfied.
解释回溯在 Prolog 中的工作原理。
Explain how backtracking works in Prolog.
解释 Prolog 语句有什么错误K is K + 1。
Explain what is wrong with the Prolog statement K is K + 1.
Prolog 程序员可以在解析过程中通过哪两种方式控制模式匹配的顺序?
What are the two ways a Prolog programmer can control the order of pattern matching during resolution?
解释 Prolog 中的生成和测试编程策略。
Explain the generate-and-test programming strategy in Prolog.
解释 Prolog 使用的封闭世界假设。为什么这是一个限制?
Explain the closed-world assumption used by Prolog. Why is this a limitation?
解释一下 Prolog 中的否定问题。为什么这是一个限制?
Explain the negation problem with Prolog. Why is this a limitation?
解释自动定理证明和Prolog的推理过程之间的联系。
Explain the connection between automatic theorem proving and Prolog’s inferencing process.
解释过程语言和非过程语言之间的区别。
Explain the difference between procedural and nonprocedural languages.
解释为什么 Prolog 系统必须进行回溯。
Explain why Prolog systems must do backtracking.
Prolog中归结和统一之间的关系是什么?
What is the relationship between resolution and unification in Prolog?
将 C# 中的数据类型概念与 Prolog 中的数据类型概念进行比较。
Compare the concept of data typing in C# with that of Prolog.
描述如何使用多处理器机器来实现解析。Prolog 是否可以使用这种方法?
Describe how a multiple-processor machine could be used to implement resolution. Could Prolog, as currently defined, use this method?
用 Prolog 语言描述您的家谱(仅基于事实),追溯到您的祖父母并包括所有后代。确保包括所有关系。
Write a Prolog description of your family tree (based only on facts), going back to your grandparents and including all descendants. Be sure to include all relationships.
写出一套家庭关系规则,包括从祖父母到两代人的所有关系。现在将这些规则添加到问题 3的事实中,并尽可能多地消除事实。
Write a set of rules for family relationships, including all relationships from grandparents through two generations. Now add these to the facts of Problem 3, and eliminate as many of the facts as you can.
将以下英语条件语句写为 Prolog 开头的 Horn 子句:
如果 Fred 是 Mike 的父亲,那么 Fred 就是 Mike 的祖先。
如果 Mike 是 Joe 的父亲,而 Mike 是 Mary 的父亲,那么 Mary 就是 Joe 的姐妹。
如果 Mike 是 Fred 的兄弟,Fred 是 Mary 的父亲,那么 Mike 就是 Mary 的叔叔。
Write the following English conditional statements as Prolog-headed Horn clauses:
If Fred is the father of Mike, then Fred is an ancestor of Mike.
If Mike is the father of Joe and Mike is the father of Mary, then Mary is the sister of Joe.
If Mike is the brother of Fred and Fred is the father of Mary, then Mike is the uncle of Mary.
解释 Scheme 和 Prolog 的列表处理功能的两个相似之处。
Explain two ways in which the list-processing capabilities of Scheme and Prolog are similar.
Scheme 和 Prolog 的列表处理能力有何不同?
In what way are the list-processing capabilities of Scheme and Prolog different?
写一篇 Prolog 与 ML 的比较,包括两个相似之处和两个不同之处。
Write a comparison of Prolog with ML, including two similarities and two differences.
从一本关于 Prolog 的书中,学习并写出一个发生检查问题的描述。为什么 Prolog 允许这个问题在其实现中存在?
From a book on Prolog, learn and write a description of an occur-check problem. Why does Prolog allow this problem to exist in its implementation?
找到有关 Skolem 范式的优质信息来源,并对其进行简短但清晰的解释。
Find a good source of information on Skolem normal form and write a brief but clear explanation of it.
使用结构parent(X, Y)、male(X)和female(X),编写一个定义的结构mother(X, Y)。
Using the structures parent(X, Y), male(X), and female(X), write a structure that defines mother(X, Y).
使用结构parent(X, Y)、male(X)和female(X),编写一个定义的结构sister(X, Y)。
Using the structures parent(X, Y), male(X), and female(X), write a structure that defines sister(X, Y).
编写一个 Prolog 程序,找出数字列表中的最大值。
Write a Prolog program that finds the maximum of a list of numbers.
编写一个 Prolog 程序,如果两个给定列表参数的交集为空,则程序成功。
Write a Prolog program that succeeds if the intersection of two given list parameters is empty.
编写一个 Prolog 程序,返回一个包含两个给定列表元素的并集的列表。
Write a Prolog program that returns a list containing the union of the elements of two given lists.
编写一个 Prolog 程序,返回给定列表的最后一个元素。
Write a Prolog program that returns the final element of a given list.
编写一个实现快速排序的 Prolog 程序。
Write a Prolog program that implements quicksort.
ACM。(1979)“A 部分:初步 Ada 参考手册”和“B 部分:Ada 编程语言设计原理”。SIGPLAN 通告,第 14 卷,第 6 期。
ACM. (1979) “Part A: Preliminary Ada Reference Manual” and “Part B: Rationale for the Design of the Ada Programming Language.” SIGPLAN Notices, Vol. 14, No. 6.
ACM. (1993a) “编程语言会议论文集历史。”ACM SIGPLAN 通知,第 28 卷,第 3 期,3 月。
ACM. (1993a) “History of Programming Language Conference Proceedings.” ACM SIGPLAN Notices, Vol. 28, No. 3, March.
ACM. (1993b) “高性能 FORTRAN 语言规范第 1 部分。”FORTRAN 论坛,第 12 卷,第 4 期。
ACM. (1993b) “High Performance FORTRAN Language Specification Part 1.” FORTRAN Forum, Vol. 12, No. 4.
Aho, AV、BW Kernighan 和 PJ Weinberger。(1988) AWK 编程语言。Addison-Wesley,马萨诸塞州雷丁。
Aho, A. V., B. W. Kernighan, and P. J. Weinberger. (1988) The AWK Programming Language. Addison-Wesley, Reading, MA.
Aho, AV、MS Lam、R. Sethi 和 JD Ullman。(2006) 编译器:原理、技术和工具,第二版。Addison-Wesley,马萨诸塞州雷丁。
Aho, A. V., M. S. Lam, R. Sethi, and J. D. Ullman. (2006) Compilers: Principles, Techniques, and Tools, 2e. Addison-Wesley, Reading, MA.
Albahari, J. 和 B. Abrahari (2012)《C# 5.0 简介》,O'Reilly Media,加利福尼亚州塞巴斯托波尔。
Albahari, J. and B. Abrahari (2012) C# 5.0 in a Nutshell, O’Reilly Media, Sebastopol, CA.
Andrews, GR 和 FB Schneider。(1983)“并发编程的概念和符号”。ACM 计算调查,第 15 卷,第 1 期,第 3-43 页。
Andrews, G. R., and F. B. Schneider. (1983) “Concepts and Notations for Concurrent Programming.” ACM Computing Surveys, Vol. 15, No. 1, pp. 3–43.
ANSI。(1966)美国国家标准编程语言 FORTRAN。美国国家标准研究所,纽约。
ANSI. (1966) American National Standard Programming Language FORTRAN. American National Standards Institute, New York.
(1976) 美国国家标准编程语言 PL/I。ANSI X3.53–1976。美国国家标准研究所,纽约。
ANSI. (1976) American National Standard Programming Language PL/I. ANSI X3.53–1976. American National Standards Institute, New York.
ANSI。(1978a)美国国家标准编程语言 FORTRAN。ANSI X3.9–1978。美国国家标准研究所,纽约。
ANSI. (1978a) American National Standard Programming Language FORTRAN. ANSI X3.9–1978. American National Standards Institute, New York.
ANSI。(1978b)美国国家标准编程语言最小 BASIC。ANSI X3.60–1978。美国国家标准研究所,纽约。
ANSI. (1978b) American National Standard Programming Language Minimal BASIC. ANSI X3.60–1978. American National Standards Institute, New York.
(1989) 美国国家标准编程语言 C。ANSI X3.159–1989。美国国家标准研究所,纽约。
ANSI. (1989) American National Standard Programming Language C. ANSI X3.159–1989. American National Standards Institute, New York.
ANSI。(1992)美国国家标准编程语言 FORTRAN 90。ANSI X3。198–1992。美国国家标准研究所,纽约。
ANSI. (1992) American National Standard Programming Language FORTRAN 90. ANSI X3. 198–1992. American National Standards Institute, New York.
Arden, BW、BA Galler 和 RM Graham。(1961)“密歇根的 MAD”。Datamation,第 7 卷,第 12 期,第 27-28 页。
Arden, B. W., B. A. Galler, and R. M. Graham. (1961) “MAD at Michigan.” Datamation, Vol. 7, No. 12, pp. 27–28.
ARM。(1995)Ada 参考手册。ISO/IEC/ANSI 8652:19。Intermetrics,马萨诸塞州剑桥。
ARM. (1995) Ada Reference Manual. ISO/IEC/ANSI 8652:19. Intermetrics, Cambridge, MA.
Arnold, K.、J. Gosling 和 D. Holmes。(2006) Java (TM) 编程语言,第 4 版。Addison-Wesley,马萨诸塞州雷丁。
Arnold, K., J. Gosling, and D. Holmes. (2006) The Java (TM) Programming Language, 4e. Addison-Wesley, Reading, MA.
Backus, J. (1954)“IBM 701 快速编码系统。”J. ACM,第 1 卷,第 4-6 页。
Backus, J. (1954) “The IBM 701 Speedcoding System.” J. ACM, Vol. 1, pp. 4–6.
Backus, J. (1959) “苏黎世 ACM-GAMM 会议提议的国际代数语言的语法和语义。”信息处理国际会议论文集。联合国教科文组织,巴黎,第 125-132 页。
Backus, J. (1959) “The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference.” Proceedings International Conference on Information Processing. UNESCO, Paris, pp. 125–132.
Backus, J. (1978) “编程能从冯·诺依曼风格中解放出来吗?一种函数式风格及其程序代数。” Commun. ACM,第 21 卷,第 8 期,第 613-641 页。
Backus, J. (1978) “Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs.” Commun. ACM, Vol. 21, No. 8, pp. 613–641.
Backus, J.、FL Bauer、J. Green、C. Katz、J. McCarthy、P. Naur、AJ Perlis、H. Rutishauser、K. Samelson、B. Vauquois、JH Wegstein、A. van Wijngaarden 和 M.伍德格。 (1963)“算法语言 ALGOL 60 的修订报告。”交流。 ACM,卷。 6,第 1 期,第 1-17 页。
Backus, J., F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, and M. Woodger. (1963) “Revised Report on the Algorithmic Language ALGOL 60.” Commun. ACM, Vol. 6, No. 1, pp. 1–17.
Ben-Ari, M. (1982)《并发编程原理》。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Ben-Ari, M. (1982) Principles of Concurrent Programming. Prentice Hall, Englewood Cliffs, NJ.
Birtwistle,总经理,O.-J。达尔、B. Myhrhaug 和 K. Nygaard。 (1973) 模拟开始。范·诺斯特兰德·莱因霍尔德,纽约。
Birtwistle, G. M., O.-J. Dahl, B. Myhrhaug, and K. Nygaard. (1973) Simula BEGIN. Van Nostrand Reinhold, New York.
Bodwin, JM、L. Bradley、K. Kanda、D. Litle 和 UF Pleban。(1982)“基于指称语义的实验性编译器生成器的经验。”ACM SIGPLAN Notices,第 17 卷,第 6 期,第 216-229 页。
Bodwin, J. M., L. Bradley, K. Kanda, D. Litle, and U. F. Pleban. (1982) “Experience with an Experimental Compiler Generator Based on Denotational Semantics.” ACM SIGPLAN Notices, Vol. 17, No. 6, pp. 216–229.
Bohm, C. 和 G. Jacopini。(1966)“流程图、图灵机和只有两种形成规则的语言。”Commun. ACM,第 9 卷,第 5 期,第 366-371 页。
Bohm, C., and G. Jacopini. (1966) “Flow Diagrams, Turing Machines, and Languages with Only Two Formation Rules.” Commun. ACM, Vol. 9, No. 5, pp. 366–371.
Bolsky, M. 和 D. Korn。(1995) 新 KornShell 命令和编程语言。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Bolsky, M., and D. Korn. (1995) The New KornShell Command and Programming Language. Prentice Hall, Englewood Cliffs, NJ.
Booch, G. (1987) 使用 Ada 进行软件工程,第二版。Benjamin/Cummings,Redwood City,CA。
Booch, G. (1987) Software Engineering with Ada, 2e. Benjamin/Cummings, Redwood City, CA.
Brinch Hansen,P. (1973) 操作系统原理。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Brinch Hansen, P. (1973) Operating System Principles. Prentice Hall, Englewood Cliffs, NJ.
Brinch Hansen,P.(1975)“编程语言 Concurrent-Pascal。”IEEE 软件工程学报,第 1 卷,第 2 期,第 199-207 页。
Brinch Hansen, P. (1975) “The Programming Language Concurrent-Pascal.” IEEE Transactions on Software Engineering, Vol. 1, No. 2, pp. 199–207.
Brinch Hansen,P. (1977) 并发程序的架构。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Brinch Hansen, P. (1977) The Architecture of Concurrent Programs. Prentice Hall, Englewood Cliffs, NJ.
Brinch Hansen,P.(1978)“分布式进程:并发编程概念。” Commun. ACM,第 21 卷,第 11 期,第 934-941 页。
Brinch Hansen, P. (1978) “Distributed Processes: A Concurrent Programming Concept.” Commun. ACM, Vol. 21, No. 11, pp. 934–941.
Brown, JA、S. Pakin 和 RP Polivka。(1988) APL2 概览。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Brown, J. A., S. Pakin, and R. P. Polivka. (1988) APL2 at a Glance. Prentice Hall, Englewood Cliffs, NJ.
Campione, M.、K. Walrath 和 A. Huml。(2001) Java 教程,第 3 版。Addison-Wesley,马萨诸塞州雷丁。
Campione, M., K. Walrath, and A. Huml. (2001) The Java Tutorial, 3e. Addison-Wesley, Reading, MA.
Chambers, C. 和 D. Ungar。(1991)“使纯面向对象语言实用。”SIGPLAN Notices,第 26 卷,第 1 期,第 1-15 页。
Chambers, C., and D. Ungar. (1991) “Making Pure Object-Oriented Languages Practical.” SIGPLAN Notices, Vol. 26, No. 1, pp. 1–15.
Chomsky, N. (1956)《语言描述的三种模型》。IRE 信息理论汇刊,第 2 卷,第 3 期,第 113-124 页。
Chomsky, N. (1956) “Three Models for the Description of Language.” IRE Transactions on Information Theory, Vol. 2, No. 3, pp. 113–124.
Chomsky, N. (1959)《论语法的某些形式属性》。《信息与控制》,第 2 卷,第 2 期,第 137-167 页。
Chomsky, N. (1959) “On Certain Formal Properties of Grammars.” Information and Control, Vol. 2, No. 2, pp. 137–167.
Christiansen, T.、BD Foy 和 L. Wall,与 J. Orwant 合作。(2013) Programming Perl,第 4 版。O'Reilly & Associates,加利福尼亚州塞巴斯托波尔。
Christiansen, T., B. D. Foy, and L. Wall, with J. Orwant. (2013) Programming Perl, 4e. O’Reilly & Associates, Sebastopol, CA.
Church, A. (1941) 数学研究年鉴。Lambda 转换演算,第 6 卷。普林斯顿大学出版社,新泽西州普林斯顿。由 Klaus Reprint Corporation 重印,纽约,1965 年。
Church, A. (1941) Annals of Mathematics Studies. Calculi of Lambda Conversion, Vol. 6. Princeton University Press, Princeton, NJ. Reprinted by Klaus Reprint Corporation, New York, 1965.
Clark, KL 和 FG McCabe。(1984) Micro-PROLOG:逻辑编程。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Clark, K. L., and F. G. McCabe. (1984) Micro-PROLOG: Programming in Logic. Prentice Hall, Englewood Cliffs, NJ.
Clarke, LA, JC Wileden 和 AL Wolf。(1980)“在 Ada 中筑巢是为了鸟类。” ACM SIGPLAN Notices,第 15 卷,第 11 期,第 139-145 页。
Clarke, L. A., J. C. Wileden, and A. L. Wolf. (1980) “Nesting in Ada Is for the Birds.” ACM SIGPLAN Notices, Vol. 15, No. 11, pp. 139–145.
Cleaveland, JC (1986) 数据类型简介。Addison-Wesley,马萨诸塞州雷丁。
Cleaveland, J. C. (1986) An Introduction to Data Types. Addison-Wesley, Reading, MA.
Cleaveland, JC 和 RC Uzgalis。(1976 年)《编程语言的语法:每个程序员都应该知道的语法》。美国爱思唯尔出版社,纽约。
Cleaveland, J. C., and R. C. Uzgalis. (1976) Grammars for Programming Languages: What Every Programmer Should Know About Grammar. American Elsevier, New York.
Clocksin,WF 和 CS Mellish。(2013)Prolog 编程:使用 ISO 标准。Springer-Verlag,纽约。
Clocksin, W. F., and C. S. Mellish. (2013) Programming in Prolog: Using the ISO Standard. Springer-Verlag, New York.
Cohen, J. (1981)“链接数据结构的垃圾收集”。ACM 计算调查,第 13 卷,第 3 期,第 341-368 页。
Cohen, J. (1981) “Garbage Collection of Linked Data Structures.” ACM Computing Surveys, Vol. 13, No. 3, pp. 341–368.
Conway, R. 和 R. Constable。(1976)“PL/-CS——PL/I 的一个有纪律的子集。”技术报告 TR76/293。康奈尔大学计算机科学系,纽约州伊萨卡。
Conway, R., and R. Constable. (1976) “PL/-CS—A Disciplined Subset of PL/I.” Technical Report TR76/293. Department of Computer Science, Cornell University, Ithaca, NY.
康奈尔大学。(1977)PL/C 用户指南,版本 7.6。康奈尔大学计算机科学系,纽约州伊萨卡。
Cornell University. (1977) PL/C User’s Guide, Release 7.6. Department of Computer Science, Cornell University, Ithaca, NY.
Correa, N. (1992) “空类别、链式绑定和解析。” In Principle—Based Parsing,RC Berwick、SP Abney 和 C. Tenny (eds.)。Kluwer Academic Publishers,波士顿,第 83-121 页。
Correa, N. (1992) “Empty Categories, Chain Binding, and Parsing.” In Principle—Based Parsing, R. C. Berwick, S. P. Abney, and C. Tenny (eds.). Kluwer Academic Publishers, Boston, pp. 83–121.
Cousineau, G.、M. Mauny 和 K. Callaway。(1998)《编程的函数式方法》。剑桥大学出版社,英国剑桥。
Cousineau, G., M. Mauny, and K. Callaway. (1998) The Functional Approach to Programming. Cambridge University Press, Cambridge, UK.
Dahl, O.-J.、EW Dijkstra 和 CAR Hoare。(1972) 结构化编程。Academic Press,纽约。
Dahl, O.-J., E. W. Dijkstra, and C. A. R. Hoare. (1972) Structured Programming. Academic Press, New York.
Dahl, O.-J. 和 K. Nygaard。(1967) SIMULA 67 通用基础提案。挪威计算中心文件,奥斯陆。
Dahl, O.-J., and K. Nygaard. (1967) SIMULA 67 Common Base Proposal. Norwegian Computing Center Document, Oslo.
Deliyanni, A. 和 RA Kowalski。(1979)“逻辑和语义网络”。Commun. ACM,第 22 卷,第 3 期,第 184-192 页。
Deliyanni, A., and R. A. Kowalski. (1979) “Logic and Semantic Networks.” Commun. ACM, Vol. 22, No. 3, pp. 184–192.
国防部。(1960 年) COBOL,通用商业导向语言的初始规范。美国国防部,华盛顿特区
Department of Defense. (1960) COBOL, Initial Specifications for a Common Business Oriented Language. U.S. Department of Defense, Washington, D.C.
国防部。(1961 年) COBOL—1961,通用商业导向语言修订规范。美国国防部,华盛顿特区
Department of Defense. (1961) COBOL—1961, Revised Specifications for a Common Business Oriented Language. U.S. Department of Defense, Washington, D.C.
国防部。(1962)COBOL—1961 EXTENDED,通用商业导向语言的扩展规范。美国国防部,华盛顿特区
Department of Defense. (1962) COBOL—1961 EXTENDED, Extended Specifications for a Common Business Oriented Language. U.S. Department of Defense, Washington, D.C.
国防部。(1975a)《高阶编程语言要求》,STRAWMAN。7 月。美国国防部,华盛顿特区
Department of Defense. (1975a) Requirements for High Order Programming Languages, STRAWMAN. July. U.S. Department of Defense, Washington, D.C.
国防部。(1975b)《高阶编程语言要求》,WOODENMAN。八月。美国国防部,华盛顿特区
Department of Defense. (1975b) Requirements for High Order Programming Languages, WOODENMAN. August. U.S. Department of Defense, Washington, D.C.
国防部。(1976)《高阶编程语言要求》,TINMAN。六月。美国国防部,华盛顿特区
Department of Defense. (1976) Requirements for High Order Programming Languages, TINMAN. June. U.S. Department of Defense, Washington, D.C.
国防部。(1977)高阶编程语言要求,IRONMAN。1 月。美国国防部,华盛顿特区
Department of Defense. (1977) Requirements for High Order Programming Languages, IRONMAN. January. U.S. Department of Defense, Washington, D.C.
国防部。(1978)高阶编程语言要求,STEELMAN。六月。美国国防部,华盛顿特区
Department of Defense. (1978) Requirements for High Order Programming Languages, STEELMAN. June. U.S. Department of Defense, Washington, D.C.
国防部。(1980)《高阶编程语言的要求》,STONEMAN。二月。美国国防部,华盛顿特区
Department of Defense. (1980) Requirements for High Order Programming Languages, STONEMAN. February. U.S. Department of Defense, Washington, D.C.
DeRemer,F.(1971)“简单的LR(k)语法”。 Commun. ACM,第 14 卷,第 7 期,第 453-460 页。
DeRemer, F. (1971) “Simple LR(k) Grammars.” Commun. ACM, Vol. 14, No. 7, pp. 453–460.
DeRemer, F. 和 T. Pennello。(1982)“LALR(1) 前瞻集的有效计算”。ACM TOPLAS,第 4 卷,第 4 期,第 615-649 页。
DeRemer, F., and T. Pennello. (1982) “Efficient Computation of LALR(1) Look-Ahead Sets.” ACM TOPLAS, Vol. 4, No. 4, pp. 615–649.
Deutsch, LP 和 DG Bobrow。(1976 年)“一种高效的增量式自动垃圾收集器。” Commun. ACM,第 11 卷,第 3 期,第 522-526 页。
Deutsch, L. P., and D. G. Bobrow. (1976) “An Efficient Incremental Automatic Garbage Collector.” Commun. ACM, Vol. 11, No. 3, pp. 522–526.
Dijkstra, EW (1968a) “Goto 语句被认为有害。” Commun. ACM,第 11 卷,第 3 期,第 147-149 页。
Dijkstra, E. W. (1968a) “Goto Statement Considered Harmful.” Commun. ACM, Vol. 11, No. 3, pp. 147–149.
Dijkstra, EW (1968b) “协作顺序进程”。《编程语言》,F. Genuys 主编。Academic Press,纽约,第 43-112 页。
Dijkstra, E. W. (1968b) “Cooperating Sequential Processes.” In Programming Languages, F. Genuys (ed.). Academic Press, New York, pp. 43–112.
Dijkstra, EW (1972)《谦逊的程序员》。《ACM 通讯》第 15 卷,第 10 期,第 859-866 页。
Dijkstra, E. W. (1972) “The Humble Programmer.” Commun. ACM, Vol. 15, No. 10, pp. 859–866.
Dijkstra, EW (1975) “Guarded Commands, Nondeterminacy, and Formal Derivation of Programs.” Commun. ACM,第 18 卷,第 8 期,第 453-457 页。
Dijkstra, E. W. (1975) “Guarded Commands, Nondeterminacy, and Formal Derivation of Programs.” Commun. ACM, Vol. 18, No. 8, pp. 453–457.
Dijkstra, EW (1976) 编程原则。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Dijkstra, E. W. (1976) A Discipline of Programming. Prentice Hall, Englewood Cliffs, NJ.
Dybvig, RK (2011) Scheme 编程语言,第 4 版。麻省理工学院出版社,波士顿。
Dybvig, R. K. (2011) The Scheme Programming Language, 4e. MIT Press, Boston.
Ellis, MA 和 B. Stroustrup。(1990)《带注释的 C++ 参考手册》。Addison-Wesley,马萨诸塞州雷丁。
Ellis, M. A., and B. Stroustrup. (1990) The Annotated C++ Reference Manual. Addison-Wesley, Reading, MA.
Farber, DJ、RE Griswold 和 IP Polonsky。(1964 年)“SNOBOL,一种字符串操作语言。”J. ACM,第 11 卷,第 1 期,第 21-30 页。
Farber, D. J., R. E. Griswold, and I. P. Polonsky. (1964) “SNOBOL, a String Manipulation Language.” J. ACM, Vol. 11, No. 1, pp. 21–30.
Farrow, R. (1982) “LINGUIST 86:另一种基于属性语法的翻译写作系统。” ACM SIGPLAN Notices,第 17 卷,第 6 期,第 160-171 页。
Farrow, R. (1982) “LINGUIST 86: Yet Another Translator Writing System Based on Attribute Grammars.” ACM SIGPLAN Notices, Vol. 17, No. 6, pp. 160–171.
Fischer, CN、GF Johnson、J. Mauney、A. Pal 和 DL Stock。(1984)“基于 Poe 语言的编辑器项目”。ACM SIGPLAN Notices,第 19 卷,第 5 期,第 21-29 页。
Fischer, C. N., G. F. Johnson, J. Mauney, A. Pal, and D. L. Stock. (1984) “The Poe Language-Based Editor Project.” ACM SIGPLAN Notices, Vol. 19, No. 5, pp. 21–29.
Fischer, CN 和 RJ LeBlanc。(1977) UW-Pascal 参考手册。麦迪逊学术计算中心,威斯康星州麦迪逊。
Fischer, C. N., and R. J. LeBlanc. (1977) UW-Pascal Reference Manual. Madison Academic Computing Center, Madison, WI.
Fischer, CN 和 RJ LeBlanc。(1980 年)“Pascal 中的运行时诊断实现”。IEEE 软件工程学报,SE-6 卷,第 4 期,第 313-319 页。
Fischer, C. N., and R. J. LeBlanc. (1980) “Implementation of Runtime Diagnostics in Pascal.” IEEE Transactions on Software Engineering, Vol. SE-6, No. 4, pp. 313–319.
Fischer, CN 和 RJ LeBlanc。(1991 年)与 C. Benjamin-Cummings 合作制作编译器,加利福尼亚州门洛帕克。
Fischer, C. N., and R. J. LeBlanc. (1991) Crafting a Compiler with C. Benjamin-Cummings, Menlo Park, CA.
Flanagan,D.(2011)JavaScript:权威指南,第 6 版。O'Reilly Media,加利福尼亚州塞巴斯托波尔。
Flanagan, D. (2011) JavaScript: The Definitive Guide, 6e. O’Reilly Media, Sebastopol, CA.
Flanagan,D. 和 Y. Matsumoto。(2008 年)《Ruby 编程语言》,O'Reilly Media,加利福尼亚州塞巴斯托波尔。
Flanagan, D., and Y. Matsumoto. (2008) The Ruby Programming Language, O’Reilly Media, Sebastopol, CA.
Floyd, RW (1967) “为程序赋予意义。” 应用数学研讨会论文集。计算机科学的数学方面,JT Schwartz (编辑)。美国数学学会,罗德岛州普罗维登斯。
Floyd, R. W. (1967) “Assigning Meanings to Programs.” Proceedings Symposium Applied Mathematics. Mathematical Aspects of Computer Science, J. T. Schwartz (ed.). American Mathematical Society, Providence, RI.
Frege, G. (1892)“Über Sinn und Bedeutung”。哲学与哲学批判杂志,卷。 100,第 25-50 页。
Frege, G. (1892) “Über Sinn und Bedeutung.” Zeitschrift für Philosophie und Philosophisches Kritik, Vol. 100, pp. 25–50.
Friedl, JEF (2006) 精通正则表达式,第 3 版。O'Reilly Media,Sebastopol,CA。
Friedl, J. E. F. (2006) Mastering Regular Expressions, 3e. O’Reilly Media, Sebastopol, CA.
Friedman, DP 和 DS Wise。(1979 年)“引用计数收集循环的能力并非不可逾越。”《信息处理快报》,第 8 卷,第 1 期,第 41-45 页。
Friedman, D. P., and D. S. Wise. (1979) “Reference Counting’s Ability to Collect Cycles Is Not Insurmountable.” Information Processing Letters, Vol. 8, No. 1, pp. 41–45.
Fuchi, K. (1981) “瞄准知识信息处理系统。”第五代计算机系统国际会议论文集。日本信息处理开发中心,东京。1982 年由阿姆斯特丹 North-Holland 出版社重新出版。
Fuchi, K. (1981) “Aiming for Knowledge Information Processing Systems.” Proceedings of the International Conference on Fifth Generation Computing Systems. Japan Information Processing Development Center, Tokyo. Republished (1982) by North-Holland Publishing, Amsterdam.
Gehani, N. (1983) Ada: 高级入门。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Gehani, N. (1983) Ada: An Advanced Introduction. Prentice Hall, Englewood Cliffs, NJ.
Gilman, L. 和 AJ Rose。(1983) APL:一种交互式方法,第 3 版。John Wiley,纽约。
Gilman, L., and A. J. Rose. (1983) APL: An Interactive Approach, 3e. John Wiley, New York.
Goodenough, JB (1975)“异常处理:问题和建议的表示法。”ACM 通讯,第 18 卷,第 12 期,第 683-696 页。
Goodenough, J. B. (1975) “Exception Handling: Issues and Proposed Notation.” Commun. ACM, Vol. 18, No. 12, pp. 683–696.
Goos, G. 和 J. Hartmanis (eds.)。(1983) 编程语言 Ada 参考手册。美国国家标准协会。ANSI/-MIL-STD-1815-A–1983。计算机科学讲义 155。Springer-Verlag,纽约。
Goos, G., and J. Hartmanis (eds.). (1983) The Programming Language Ada Reference Manual. American National Standards Institute. ANSI/-MIL-STD-1815-A–1983. Lecture Notes in Computer Science 155. Springer-Verlag, New York.
Gordon, M. (1979) 编程语言的外延描述,简介。Springer-Verlag,纽约。
Gordon, M. (1979) The Denotational Description of Programming Languages, An Introduction. Springer-Verlag, New York.
Graham, P. (1996) ANSI Common LISP。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Graham, P. (1996) ANSI Common LISP. Prentice Hall, Englewood Cliffs, NJ.
Gries, D. (1981) 编程的科学。Springer-Verlag,纽约。
Gries, D. (1981) The Science of Programming. Springer-Verlag, New York.
Halstead, RH, Jr. (1985) “Multilisp:一种用于并发符号计算的语言。” ACM 编程语言和系统学报,第 7 卷,第 4 期,1985 年 10 月,第 501-538 页。
Halstead, R. H., Jr. (1985) “Multilisp: A Language for Concurrent Symbolic Computation.” ACM Transactions on Programming Language and Systems, Vol. 7, No. 4, October 1985, pp. 501–538.
Halvorson, M. (2013) Microsoft Visual Basic 2013 循序渐进。微软出版社,华盛顿州雷德蒙德。
Halvorson, M. (2013) Microsoft Visual Basic 2013 Step by Step. Microsoft Press, Redmond, WA.
Hammond, P. (1983) APES:用户手册。计算系报告 82/9。伦敦帝国理工学院。
Hammond, P. (1983) APES: A User Manual. Department of Computing Report 82/9. Imperial College of Science and Technology, London.
Harbison, SP, III 和 GL Steele, Jr. (2002) C: 参考手册,第 5 版。Prentice Hall,Upper Saddle River,新泽西州。
Harbison, S. P., III, and G. L. Steele, Jr. (2002) C: A Reference Manual, 5e. Prentice Hall, Upper Saddle River, NJ.
Henderson, P. (1980) 函数式编程:应用与实现。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Henderson, P. (1980) Functional Programming: Application and Implementation. Prentice Hall, Englewood Cliffs, NJ.
Hoare, CAR (1969)“计算机编程的公理基础”。ACM 通讯,第 12 卷,第 10 期,第 576-580 页。
Hoare, C. A. R. (1969) “An Axiomatic Basis of Computer Programming.” Commun. ACM, Vol. 12, No. 10, pp. 576–580.
Hoare, CAR (1972)“数据表示正确性的证明。”Acta Informatica,第 1 卷,第 271-281 页。
Hoare, C. A. R. (1972) “Proof of Correctness of Data Representations.” Acta Informatica, Vol. 1, pp. 271–281.
Hoare, CAR (1973) “编程语言设计技巧”。ACM SIGACT/SIGPLAN 编程语言原理会议论文集。也以技术报告 STAN-CS-73-403 的形式出版,斯坦福大学计算机科学系。
Hoare, C. A. R. (1973) “Hints on Programming Language Design.” Proceedings ACM SIGACT/SIGPLAN Conference on Principles of Programming Languages. Also published as Technical Report STAN-CS-73-403, Stanford University Computer Science Department.
Hoare, CAR (1974) “监视器:一种操作系统结构概念。” Commun. ACM,第 17 卷,第 10 期,第 549-557 页。
Hoare, C. A. R. (1974) “Monitors: An Operating System Structuring Concept.” Commun. ACM, Vol. 17, No. 10, pp. 549–557.
Hoare, CAR (1978) “通信顺序过程”。 Commun. ACM,第 21 卷,第 8 期,第 666-677 页。
Hoare, C. A. R. (1978) “Communicating Sequential Processes.” Commun. ACM, Vol. 21, No. 8, pp. 666–677.
Hoare, CAR (1981)《皇帝的旧衣服》。ACM 通讯,第 24 卷,第 2 期,第 75-83 页。
Hoare, C. A. R. (1981) “The Emperor’s Old Clothes.” Commun. ACM, Vol. 24, No. 2, pp. 75–83.
Hoare, CAR 和 N. Wirth。(1973 年)“编程语言 Pascal 的公理定义”。Acta Informatica,第 2 卷,第 335-355 页。
Hoare, C. A. R., and N. Wirth. (1973) “An Axiomatic Definition of the Programming Language Pascal.” Acta Informatica, Vol. 2, pp. 335–355.
Hogger, CJ (1984) 逻辑编程简介。Academic Press,伦敦。
Hogger, C. J. (1984) Introduction to Logic Programming. Academic Press, London.
Hogger, CJ (1991) 逻辑编程基本原理。牛津科学出版物,牛津,英国。
Hogger, C. J. (1991) Essentials of Logic Programming. Oxford Science Publications, Oxford, England.
Holt, RC、GS Graham、ED Lazowska 和 MA Scott。(1978) 操作系统应用程序的结构化并发编程。Addison-Wesley,马萨诸塞州雷丁。
Holt, R. C., G. S. Graham, E. D. Lazowska, and M. A. Scott. (1978) Structured Concurrent Programming with Operating Systems Applications. Addison-Wesley, Reading, MA.
Horn, A. (1951)《论代数直接并集的真句子》。《符号逻辑杂志》,第 16 卷,第 14-21 页。
Horn, A. (1951) “On Sentences Which Are True of Direct Unions of Algebras.” J. Symbolic Logic, Vol. 16, pp. 14–21.
Hudak, P. 和 J. Fasel。(1992)“Haskell 简介”。ACM SIGPLAN Notices,第 27 卷,第 5 期,1992 年 5 月,第 T1-T53 页。
Hudak, P., and J. Fasel. (1992) “A Gentle Introduction to Haskell.” ACM SIGPLAN Notices, Vol. 27, No. 5, May 1992, pp. T1–T53.
Hughes, J. (1989) “为什么函数式编程很重要。”计算机杂志,第 32 卷,第 2 期,第 98-107 页。
Hughes, J. (1989) “Why Functional Programming Matters.” The Computer Journal, Vol. 32, No. 2, pp. 98–107.
Huskey, HK、R. Love 和 N. Wirth。(1963)“BC NELIAC 的句法描述”。Commun. ACM,第 6 卷,第 7 期,第 367-375 页。
Huskey, H. K., R. Love, and N. Wirth. (1963) “A Syntactic Description of BC NELIAC.” Commun. ACM, Vol. 6, No. 7, pp. 367–375.
IBM。(1954)“IBM 数学公式翻译系统 FORTRAN 的初步报告规范。”IBM 公司,纽约。
IBM. (1954) “Preliminary Report, Specifications for the IBM Mathematical FORmula TRANslating System, FORTRAN.” IBM Corporation, New York.
IBM。(1956 年)“程序员参考手册,IBM 704 EDPM 的 FORTRAN 自动编码系统。”IBM 公司,纽约。
IBM. (1956) “Programmer’s Reference Manual, The FORTRAN Automatic Coding System for the IBM 704 EDPM.” IBM Corporation, New York.
IBM。(1964)《新编程语言》。IBM 英国实验室,赫斯利,英国。
IBM. (1964) The New Programming Language. IBM UK Laboratories, Hursley, England.
Ichbiah, JD、JC Heliard、O. Roubine、JGP Barnes、B. Krieg-Brueckner 和 B. A. Wichmann。(1979)“B 部分:Ada 编程语言设计原理”。ACM SIGPLAN 通告,第 14 卷,第 6 期。
Ichbiah, J. D., J. C. Heliard, O. Roubine, J. G. P. Barnes, B. Krieg-Brueckner, and B. A. Wichmann. (1979) “Part B: Rationale for the Design of the Ada Programming Language.” ACM SIGPLAN Notices, Vol. 14, No. 6.
IEEE。(1985)“二进制浮点算术。”IEEE 标准 754,IEEE,纽约。
IEEE. (1985) “Binary Floating-Point Arithmetic.” IEEE Standard 754, IEEE, New York.
INCITS/ISO/IEC。(1997)1539-1-1997,信息技术 - 编程语言 - FORTRAN,第 1 部分:基础语言。美国国家标准研究所,纽约。
INCITS/ISO/IEC. (1997) 1539-1-1997, Information Technology—Programming Languages—FORTRAN, Part 1: Base Language. American National Standards Institute, New York.
英格曼,PZ (1967)。 “建议采用帕尼尼-巴克斯形式。”交流。 ACM,卷。 10,第 3 期,第 14 页。 137.
Ingerman, P. Z. (1967). “Panini-Backus Form Suggested.” Commun. ACM, Vol. 10, No. 3, p. 137.
ISO。(1998)ISO14882-1,ISO/IEC 标准 – 信息技术 – 编程语言 – C++。国际标准化组织,瑞士日内瓦。
ISO. (1998) ISO14882-1, ISO/IEC Standard – Information Technology—Programming Language—C++. International Organization for Standardization, Geneva, Switzerland.
ISO. (1999) ISO/IEC 9899:1999,编程语言 C。美国国家标准协会,纽约。
ISO. (1999) ISO/IEC 9899:1999, Programming Language C. American National Standards Institute, New York.
ISO/IEC。(1996) 14977:1996,信息技术 - 句法元语言 - 扩展 BNF。国际标准化组织,瑞士日内瓦。
ISO/IEC. (1996) 14977:1996, Information Technology—Syntactic Metalanguage—Extended BNF. International Organization for Standardization, Geneva, Switzerland.
ISO/IEC。(2002)1989:2002,信息技术 - 编程语言 - COBOL。美国国家标准研究所,纽约。
ISO/IEC. (2002) 1989:2002, Information Technology—Programming Languages—COBOL. American National Standards Institute, New York.
ISO/IEC。(2010) 1539-1,信息技术 - 编程语言 - Fortran。美国国家标准研究所,纽约。
ISO/IEC. (2010) 1539-1, Information Technology—Programming Languages—Fortran. American National Standards Institute, New York.
ISO/IEC. (2014) 8652/2012(E),Ada 2012 参考手册。Springer-Verlag,纽约。
ISO/IEC. (2014) 8652/2012(E), Ada 2012 Reference Manual. Springer-Verlag, New York.
Iverson, KE (1962) 《一种编程语言》。John Wiley,纽约。
Iverson, K. E. (1962) A Programming Language. John Wiley, New York.
Jensen, K. 和 N. Wirth。(1974) Pascal 用户手册和报告。Springer-Verlag,柏林。
Jensen, K., and N. Wirth. (1974) Pascal Users Manual and Report. Springer-Verlag, Berlin.
Johnson, SC (1975) “Yacc:又一个编译器。” 计算科学报告 32。AT&T 贝尔实验室,Murray Hill,新泽西州。
Johnson, S. C. (1975) “Yacc: Yet Another Compiler-Compiler.” Computing Science Report 32. AT&T Bell Laboratories, Murray Hill, NJ.
Jones, ND (ed.)。(1980) 语义导向编译器生成。计算机科学讲义,第 94 卷。Springer-Verlag,海德堡,FRG。
Jones, N. D. (ed.). (1980) Semantic-Directed Compiler Generation. Lecture Notes in Computer Science, Vol. 94. Springer-Verlag, Heidelberg, FRG.
Kay, A. (1969) 反应引擎。博士论文。犹他大学,9 月。
Kay, A. (1969) The Reactive Engine. PhD Thesis. University of Utah, September.
Kernighan, BW 和 DM Ritchie。(1978)《C 编程语言》。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Kernighan, B. W., and D. M. Ritchie. (1978) The C Programming Language. Prentice Hall, Englewood Cliffs, NJ.
Knuth, DE (1965)《论从左到右的语言翻译》。《信息与控制》,第 8 卷,第 6 期,第 607-639 页。
Knuth, D. E. (1965) “On the Translation of Languages from Left to Right.” Information & Control, Vol. 8, No. 6, pp. 607–639.
Knuth, DE (1967) “ALGOL 60 中剩余的故障点”。 Commun. ACM,第 10 卷,第 10 期,第 611-618 页。
Knuth, D. E. (1967) “The Remaining Trouble Spots in ALGOL 60.” Commun. ACM, Vol. 10, No. 10, pp. 611–618.
Knuth, DE (1968)“上下文无关语言的语义”。《数学系统理论》,第 2 卷,第 2 期,第 127-146 页。
Knuth, D. E. (1968) “Semantics of Context-Free Languages.” Mathematical Systems Theory, Vol. 2, No. 2, pp. 127–146.
Knuth, DE (1974) “使用 GOTO 语句进行结构化编程。” ACM 计算调查,第 6 卷,第 4 期,第 261-301 页。
Knuth, D. E. (1974) “Structured Programming with GOTO Statements.” ACM Computing Surveys, Vol. 6, No. 4, pp. 261–301.
Knuth, DE (1981)《计算机编程艺术》,第 II 卷,第 2 版。Addison-Wesley,马萨诸塞州雷丁。
Knuth, D. E. (1981) The Art of Computer Programming, Vol. II, 2e. Addison-Wesley, Reading, MA.
Knuth, DE 和 LT Pardo。(1977 年)“编程语言的早期发展”。《计算机科学与技术百科全书》,G. Holzman 和 A. Kent 编。第 7 卷。Dekker,纽约,第 419-493 页。
Knuth, D. E., and L. T. Pardo. (1977) “Early Development of Programming Languages.” In Encyclopedia of Computer Science and Technology, G. Holzman and A. Kent (eds.). Vol. 7. Dekker, New York, pp. 419–493.
Kowalski, RA (1979) 问题解决逻辑。人工智能系列,第 7 卷。Elsevier-North Holland,纽约。
Kowalski, R. A. (1979) Logic for Problem Solving. Artificial Intelligence Series, Vol. 7. Elsevier-North Holland, New York.
Laning, JH, Jr. 和 N. Zierler。(1954) “Whirlwind I 数学方程式翻译程序”。工程备忘录 E-364。麻省理工学院仪器实验室,马萨诸塞州剑桥。
Laning, J. H., Jr., and N. Zierler. (1954) “A Program for Translation of Mathematical Equations for Whirlwind I.” Engineering memorandum E-364. Instrumentation Laboratory, Massachusetts Institute of Technology, Cambridge, MA.
Ledgard, HF 和 M. Marcotty。(1975)“控制结构的谱系”。Commun. ACM,第 18 卷,第 11 期,第 629-639 页。
Ledgard, H. F., and M. Marcotty. (1975) “A Genealogy of Control Structures.” Commun. ACM, Vol. 18, No. 11, pp. 629–639.
Lippman, SB 和 J. Lajoie。(2012) C++ Primer,第 5 版。Addison-Wesley,Upper Saddle River,新泽西州。
Lippman, S. B., and J. Lajoie. (2012) C++ Primer, 5e. Addison-Wesley, Upper Saddle River, NJ.
Lischner, R. (2000) Delphi 简介。O'Reilly Media,加利福尼亚州塞巴斯托波尔。
Lischner, R. (2000) Delphi in a Nutshell. O’Reilly Media, Sebastopol, CA.
Liskov,B.、RL Atkinson、T. Bloom、JEB Moss、C. Scheffert、R. Scheifler 和 A. Snyder。(1981 年)CLU 参考手册。Springer,纽约。
Liskov, B., R. L. Atkinson, T. Bloom, J. E. B. Moss, C. Scheffert, R. Scheifler, and A. Snyder. (1981) CLU Reference Manual. Springer, New York.
Lomet,D.(1975)“使对释放存储的引用无效的方案。”IBM 研究与开发杂志,第 19 卷,第 26-35 页。
Lomet, D. (1975) “Scheme for Invalidating References to Freed Storage.” IBM Journal of Research and Development, Vol. 19, pp. 26–35.
Lutz, M. (2013) 学习 Python,第 5 版。O'Reilly Media,加利福尼亚州塞巴斯托波尔。
Lutz, M. (2013) Learning Python, 5e. O’Reilly Media, Sebastopol, CA.
MacLaren, MD (1977)“PL/I 中的异常处理”。ACM SIGPLAN Notices,第 12 卷,第 3 期,第 101-104 页。
MacLaren, M. D. (1977) “Exception Handling in PL/I.” ACM SIGPLAN Notices, Vol. 12, No. 3, pp. 101–104.
Marcotty, M.、HF Ledgard 和 GV Bochmann。(1976)“形式定义样本”。ACM 计算调查,第 8 卷,第 2 期,第 191-276 页。
Marcotty, M., H. F. Ledgard, and G. V. Bochmann. (1976) “A Sampler of Formal Definitions.” ACM Computing Surveys, Vol. 8, No. 2, pp. 191–276.
Mather, DG 和 SV Waite (eds.)。(1971) BASIC,第 6 版。新英格兰大学出版社,新罕布什尔州汉诺威。
Mather, D. G., and S. V. Waite (eds.). (1971) BASIC, 6e. University Press of New England, Hanover, NH.
McCarthy, J. (1960) “符号表达式的递归函数及其机器计算,第一部分。” Commun. ACM,第 3 卷,第 4 期,第 184-195 页。
McCarthy, J. (1960) “Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.” Commun. ACM, Vol. 3, No. 4, pp. 184–195.
McCarthy, J.、PW Abrahams、DJ Edwards、TP Hart 和 M. Levin。(1965) LISP 1.5 程序员手册,第 2 版。麻省理工学院出版社,马萨诸塞州剑桥。
McCarthy, J., P. W. Abrahams, D. J. Edwards, T. P. Hart, and M. Levin. (1965) LISP 1.5 Programmer’s Manual, 2e. MIT Press, Cambridge, MA.
McCracken, D. (1970) “Whither APL.” Datamation,9 月 15 日,第 53-57 页。
McCracken, D. (1970) “Whither APL.” Datamation, September 15, pp. 53–57.
Metcalf, M.、J. Reid 和 M. Cohen。(2004) Fortran 95/2003 解析,第 3 版。牛津大学出版社,牛津,英国。
Metcalf, M., J. Reid, and M. Cohen. (2004) Fortran 95/2003 Explained, 3e. Oxford University Press, Oxford, England.
Meyer, B. (1990) 《编程语言理论简介》。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Meyer, B. (1990) Introduction to the Theory of Programming Languages. Prentice Hall, Englewood Cliffs, NJ.
Milner, R.、R. Harper 和 M. Tofle。(1997)《标准 ML 定义 - 修订版》。麻省理工学院出版社,马萨诸塞州剑桥。
Milner, R., R. Harper, and M. Tofle. (1997) The Definition of Standard ML-Revised. MIT Press, Cambridge, MA.
Milos, D.、U. Pleban 和 G. Loegel。(1984)“编译器规范的直接实现”。POPL '84 第 11 届 ACM SIGACT-SIGPLAN 编程语言研讨会论文集,第 196-202 页。
Milos, D., U. Pleban, and G. Loegel. (1984) “Direct Implementation of Compiler Specifications.” POPL ’84 Proceedings of the 11th ACM SIGACT-SIGPLAN Symposium on Programming Languages, pp. 196–202.
Mitchell, JG、W. Maybury 和 R. Sweet。(1979) Mesa 语言手册,版本 5.0,CSL-79-3。施乐研究中心,加利福尼亚州帕洛阿尔托。
Mitchell, J. G., W. Maybury, and R. Sweet. (1979) Mesa Language Manual, Version 5.0, CSL-79-3. Xerox Research Center, Palo Alto, CA.
Moss, C. (1994) Prolog++:面向对象和逻辑编程的力量。Addison-Wesley,马萨诸塞州雷丁。
Moss, C. (1994) Prolog++: The Power of Object-Oriented and Logic Programming. Addison-Wesley, Reading, MA.
Moto-oka, T. (1981) “知识信息处理系统的挑战。”第五代计算系统国际会议论文集。日本信息处理开发中心,东京。1982 年由阿姆斯特丹 North-Holland 出版社重新出版。
Moto-oka, T. (1981) “Challenge for Knowledge Information Processing Systems.” Proceedings of the International Conference on Fifth Generation Computing Systems. Japan Information Processing Development Center, Tokyo. Republished (1982) by North-Holland Publishing, Amsterdam.
诺尔,P.(编辑)。 (1960)“关于算法语言 ALGOL 60 的报告。”交流。 ACM,卷。 3,第 5 期,第 299-314 页。
Naur, P. (ed.). (1960) “Report on the Algorithmic Language ALGOL 60.” Commun. ACM, Vol. 3, No. 5, pp. 299–314.
Newell, A. 和 HA Simon。(1956 年)“逻辑理论机器 — 一种复杂的信息处理系统。”IRE 信息理论汇刊,IT-2 卷,第 3 期,第 61-79 页。
Newell, A., and H. A. Simon. (1956) “The Logic Theory Machine—A Complex Information Processing System.” IRE Transactions on Information Theory, Vol. IT-2, No. 3, pp. 61–79.
Newell, A. 和 FM Tonge。(1960 年)“信息处理语言 V 简介”。ACM 通讯,第 3 卷,第 4 期,第 205-211 页。
Newell, A., and F. M. Tonge. (1960) “An Introduction to Information Processing Language V.” Commun. ACM, Vol. 3, No. 4, pp. 205–211.
Nilsson, NJ (1971) 人工智能中的问题解决方法。麦格劳-希尔,纽约。
Nilsson, N. J. (1971) Problem Solving Methods in Artificial Intelligence. McGraw-Hill, New York.
Pagan, FG (1981) 编程语言的形式化规范。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Pagan, F. G. (1981) Formal Specifications of Programming Languages. Prentice Hall, Englewood Cliffs, NJ.
Papert, S. (1980) MindStorms:儿童、计算机和强大的创意。Basic Books,纽约。
Papert, S. (1980) MindStorms: Children, Computers and Powerful Ideas. Basic Books, New York.
Perlis, A. 和 K. Samelson。(1958)“初步报告 - 国际代数语言”。Commun. ACM,第 1 卷,第 12 期,第 8-22 页。
Perlis, A., and K. Samelson. (1958) “Preliminary Report—International Algebraic Language.” Commun. ACM, Vol. 1, No. 12, pp. 8–22.
Peyton Jones, SL (1987) 函数式编程语言的实现。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Peyton Jones, S. L. (1987) The Implementation of Functional Programming Languages. Prentice Hall, Englewood Cliffs, NJ.
Pratt, TW 和 MV Zelkowitz。(2001) 编程语言:设计和实现,第 4 版。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Pratt, T. W., and M. V. Zelkowitz. (2001) Programming Languages: Design and Implementation, 4e. Prentice Hall, Englewood Cliffs, NJ.
雷明顿-兰德公司。(1952 年)“UNIVAC 短代码”。未出版的重复笔记合集。序言由 AB Tonik 撰写,日期为 1955 年 10 月 25 日(1 页);序言由 J. R. Logan 撰写,未注明日期,但显然是 1952 年的(1 页);初步阐述,1952 年?(22 页,其中第 20-22 页似乎是后来的替换);短代码补充信息,主题一(7 页);附录 #1、2、3、4(9 页)。
Remington-Rand. (1952) “UNIVAC Short Code.” Unpublished collection of dittoed notes. Preface by A. B. Tonik, dated October 25, 1955 (1 p.); Preface by J. R. Logan, undated but apparently from 1952 (1 p.); Preliminary exposition, 1952? (22 pp., in which pp. 20–22 appear to be a later replacement); Short code supplementary information, topic one (7 pp.); Addenda #1, 2, 3, 4 (9 pp.).
Reppy, JH (1999) 机器学习中的并发编程。剑桥大学出版社,纽约。
Reppy, J. H. (1999) Concurrent Programming in ML. Cambridge University Press, New York.
Richards, M. (1969)“BCPL:编译器和系统编程的工具。” Proc. AFIPS SJCC,第 34 卷,第 557-566 页。
Richards, M. (1969) “BCPL: A Tool for Compiler Writing and Systems Programming.” Proc. AFIPS SJCC, Vol. 34, pp. 557–566.
Robbins, A. (2005) Unix in a Nutshell,第 4 版。O'Reilly Media,Sebastopol,加利福尼亚州。
Robbins, A. (2005) Unix in a Nutshell, 4e. O’Reilly Media, Sebastopol, CA.
Robinson, JA (1965)“基于归结原理的机器导向逻辑”。《ACM 杂志》第 12 卷,第 23-41 页。
Robinson, J. A. (1965) “A Machine-Oriented Logic Based on the Resolution Principle.” Journal of the ACM, Vol. 12, pp. 23–41.
Roussel, P. (1975) “PROLOG:参考和使用手册。”研究报告。法国艾克斯-马赛大学人工智能小组。
Roussel, P. (1975) “PROLOG: Manual de Reference et D’utilisation.” Research Report. Artificial Intelligence Group, University of Aix-Marseille, Luming, France.
Rubin, F. (1987) “‘GOTO 语句被认为有害’被认为有害” (致编辑的信)。Commun. ACM,第 30 卷,第 3 期,第 195-196 页。
Rubin, F. (1987) “‘GOTO Statement Considered Harmful’ considered harmful” (letter to editor). Commun. ACM, Vol. 30, No. 3, pp. 195–196.
Rutishauser, H. (1967) ALGOL 60 描述。Springer-Verlag,纽约。
Rutishauser, H. (1967) Description of ALGOL 60. Springer-Verlag, New York.
Sammet, JE (1969) 编程语言:历史和基础。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Sammet, J. E. (1969) Programming Languages: History and Fundamentals. Prentice Hall, Englewood Cliffs, NJ.
Sammet, JE (1976) “1974-75 年编程语言名录”。ACM 通讯,第 19 卷,第 12 期,第 655-669 页。
Sammet, J. E. (1976) “Roster of Programming Languages for 1974–75.” Commun. ACM, Vol. 19, No. 12, pp. 655–669.
Schorr, H. 和 W. Waite。(1967) “一种高效的独立于机器的各种列表结构垃圾收集程序。”Commun. ACM,第 10 卷,第 8 期,第 501-506 页。
Schorr, H., and W. Waite. (1967) “An Efficient Machine Independent Procedure for Garbage Collection in Various List Structures.” Commun. ACM, Vol. 10, No. 8, pp. 501–506.
Scott, DS 和 C. Strachey。(1971) “面向计算机语言的数学语义学。”《计算机与自动化研讨会论文集》,J. Fox 主编。布鲁克林理工学院出版社,纽约,第 19-46 页。
Scott, D. S., and C. Strachey. (1971) “Towards a Mathematical Semantics for Computer Language.” In Proceedings, Symposium on Computers and Automation, J. Fox (ed.). Polytechnic Institute of Brooklyn Press, New York, pp. 19–46.
Scott, M. (2009) 编程语言语用学,第 3 版。Morgan Kaufman,加利福尼亚州旧金山。
Scott, M. (2009) Programming Language Pragmatics, 3e. Morgan Kaufman, San Francisco, CA.
Sebesta, RW (1991) VAX 结构化汇编语言编程,第二版。Benjamin/Cummings,Redwood City,CA。
Sebesta, R. W. (1991) VAX Structured Assembly Language Programming, 2e. Benjamin/Cummings, Redwood City, CA.
Sergot, MJ (1983) “逻辑编程的查询用户功能。”《集成交互式计算机系统》,P. Degano 和 E. Sandewall 编。North-Holland Publishing,阿姆斯特丹。
Sergot, M. J. (1983) “A Query-the-User Facility for Logic Programming.” In Integrated Interactive Computer Systems, P. Degano and E. Sandewall (eds.). North-Holland Publishing, Amsterdam.
Shaw, CJ (1963)“JOVIAL 的规范”。ACM 通讯,第 6 卷,第 12 期,第 721-736 页。
Shaw, C. J. (1963) “A Specification of JOVIAL.” Commun. ACM, Vol. 6, No. 12, pp. 721–736.
Smith, JB (2006) 实用 OCaml。Apress, Springer-Verlag,纽约。
Smith, J. B. (2006) Practical OCaml. Apress, Springer-Verlag, New York.
Sommerville, I. (2010) 软件工程,第 9 版。Addison-Wesley,马萨诸塞州雷丁。
Sommerville, I. (2010) Software Engineering, 9e. Addison-Wesley, Reading, MA.
Steele, GL, Jr. (1990) Common LISP 语言,第 2 版。Digital Press,马萨诸塞州伯灵顿。
Steele, G. L., Jr. (1990) Common LISP The Language, 2e. Digital Press, Burlington, MA.
Stoy, JE (1977) 指称语义:Scott–Strachey 编程语言语义学方法。麻省理工学院出版社,马萨诸塞州剑桥。
Stoy, J. E. (1977) Denotational Semantics: The Scott–Strachey Approach to Programming Language Semantics. MIT Press, Cambridge, MA.
Stroustrup, B. (1983) “向 C 语言添加类:语言进化的练习。”软件实践与经验,第 13 卷,第 139-161 页。
Stroustrup, B. (1983) “Adding Classes to C: An Exercise in Language Evolution.” Software—Practice and Experience, Vol. 13, pp. 139–161.
Stroustrup, B. (1984)“C 语言中的数据抽象”。AT&T 贝尔实验室技术期刊,第 63 卷,第 8 期,第 1701-1732 页。
Stroustrup, B. (1984) “Data Abstraction in C.” AT&T Bell Laboratories Technical Journal, Vol. 63, No. 8, pp. 1701–1732.
Stroustrup, B. (1986)《C++ 编程语言》。Addison-Wesley,马萨诸塞州雷丁。
Stroustrup, B. (1986) The C++ Programming Language. Addison-Wesley, Reading, MA.
Stroustrup, B. (1988)“什么是面向对象编程?”IEEE 软件,1988 年 5 月,第 10-20 页。
Stroustrup, B. (1988) “What Is Object-Oriented Programming?” IEEE Software, May 1988, pp. 10–20.
Stroustrup, B. (1994)《C++ 的设计和演进》。Addison-Wesley,马萨诸塞州雷丁。
Stroustrup, B. (1994) The Design and Evolution of C++. Addison-Wesley, Reading, MA.
Stroustrup, B. (1997)《C++ 编程语言》,第 3 版。Addison-Wesley,马萨诸塞州雷丁。
Stroustrup, B. (1997) The C++ Programming Language, 3e. Addison-Wesley, Reading, MA.
Sussman, GJ 和 GL Steele, Jr. (1975)“Scheme:扩展 Lambda 演算的解释器。”麻省理工学院 AI 备忘录 No. 349(1975 年 12 月)。
Sussman, G. J., and G. L. Steele, Jr. (1975) “Scheme: An Interpreter for Extended Lambda Calculus.” MIT AI Memo No. 349 (December 1975).
Suzuki, N. (1982)《指针‘旋转’分析》。《ACM 通讯》,第 25 卷,第 5 期,第 330-335 页。
Suzuki, N. (1982) “Analysis of Pointer ‘Rotation’.” Commun. ACM, Vol. 25, No. 5, pp. 330–335.
Syme, D.、A. Granicz 和 A. Cisternino。(2010) Expert F# 2.0。Apress、Springer-Verlag,纽约。
Syme, D., A. Granicz, and A. Cisternino. (2010) Expert F# 2.0. Apress, Springer-Verlag, New York.
Tatroe, K.、P. MacIntyre 和 R. Lerdorf。(2013) Programming PHP,第 3 版。O'Reilly Media,Sebastopol,加利福尼亚州。
Tatroe, K., P. MacIntyre, and R. Lerdorf. (2013) Programming PHP, 3e. O’Reilly Media, Sebastopol, CA.
Tanenbaum, AS (2005) 结构化计算机组织,第 5 版。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Tanenbaum, A. S. (2005) Structured Computer Organization, 5e. Prentice Hall, Englewood Cliffs, NJ.
Teitelbaum, T. 和 T. Reps. (1981)“康奈尔程序合成器:语法制导编程环境。” Commun. ACM,第 24 卷,第 9 期,第 563-573 页。
Teitelbaum, T., and T. Reps. (1981) “The Cornell Program Synthesizer: A Syntax-Directed Programming Environment.” Commun. ACM, Vol. 24, No. 9, pp. 563–573.
Tenenbaum, AM, Y. Langsam 和 MJ Augenstein。(1990) 使用 C 的数据结构。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Tenenbaum, A. M., Y. Langsam, and M. J. Augenstein. (1990) Data Structures Using C. Prentice Hall, Englewood Cliffs, NJ.
Thomas, D.、A. Hunt 和 C. Fowler。(2013 年)《Ruby 1.9 和 2.0 编程:实用程序员指南(Ruby 的各个方面)》。The Pragmatic Bookshelf,北卡罗来纳州罗利。
Thomas, D., A. Hunt, and C. Fowler. (2013) Programming Ruby 1.9 & 2.0: The Pragmatic Programmers Guide (The Facets of Ruby). The Pragmatic Bookshelf, Raleigh, NC.
Thompson, S. (1999) Haskell: 函数式编程技巧,第二版。Addison-Wesley,马萨诸塞州雷丁。
Thompson, S. (1999) Haskell: The Craft of Functional Programming, 2e. Addison-Wesley, Reading, MA.
Turner,D.(1986)“米兰达概述”。 ACM SIGPLAN Notices,第 21 卷,第 12 期,第 158-166 页。
Turner, D. (1986) “An Overview of Miranda.” ACM SIGPLAN Notices, Vol. 21, No. 12, pp. 158–166.
Ullman, JD (1998) ML 编程元素。ML97 版。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Ullman, J. D. (1998) Elements of ML Programming. ML97 edition. Prentice Hall, Englewood Cliffs, NJ.
van Emden, MH (1980) “McDermott 论 Prolog:反驳。”SIGART Newsletter,第 72 期,8 月,第 19-20 页。
van Emden, M. H. (1980) “McDermott on Prolog: A Rejoinder.” SIGART Newsletter, No. 72, August, pp. 19–20.
van Wijngaarden, A.、BJ Mailloux、JEL Peck 和 CHA Koster。 (1969)“关于算法语言 ALGOL 68 的报告。”数值数学,卷。 14,第 2 期,第 79-218 页。
van Wijngaarden, A., B. J. Mailloux, J. E. L. Peck, and C. H. A. Koster. (1969) “Report on the Algorithmic Language ALGOL 68.” Numerische Mathematik, Vol. 14, No. 2, pp. 79–218.
Wadler, P. (1998) “为什么没有人使用函数式语言。”ACM SIGPLAN Notices,第 33 卷,第 2 期,1998 年 2 月,第 25-30 页。
Wadler, P. (1998) “Why No One Uses Functional Languages.” ACM SIGPLAN Notices, Vol. 33, No. 2, February 1998, pp. 25–30.
Warren, DHD、LM Pereira 和 FCN Pereira。(1979)“DEC System-10 Prolog 用户指南。”临时论文 15。苏格兰爱丁堡大学人工智能系。
Warren, D. H. D., L. M. Pereira, and F. C. N. Pereira. (1979) “User’s Guide to DEC System-10 Prolog.” Occasional Paper 15. Department of Artificial Intelligence, University of Edinburgh, Scotland.
Watt, DA (1979)《Pascal 的扩展属性语法》。ACM SIGPLAN Notices,第 14 卷,第 2 期,第 60-74 页。
Watt, D. A. (1979) “An Extended Attribute Grammar for Pascal.” ACM SIGPLAN Notices, Vol. 14, No. 2, pp. 60–74.
Wegner, P. (1972)《维也纳定义语言》。《ACM 计算概览》,第 4 卷,第 1 期,第 5-63 页。
Wegner, P. (1972) “The Vienna Definition Language.” ACM Computing Surveys, Vol. 4, No. 1, pp. 5–63.
Weissman, C. (1967) LISP 1.5 Primer。Dickenson Press,加利福尼亚州贝尔蒙特。
Weissman, C. (1967) LISP 1.5 Primer. Dickenson Press, Belmont, CA.
Wexelblat, RL (ed.)。(1981)《编程语言史》。Academic Press,纽约。
Wexelblat, R. L. (ed.). (1981) History of Programming Languages. Academic Press, New York.
Wheeler, DJ (1950)“EDSAC 的项目组织和初始命令。” Proc. R. Soc. London,Ser. A,第 202 卷,第 573-589 页。
Wheeler, D. J. (1950) “Programme Organization and Initial Orders for the EDSAC.” Proc. R. Soc. London, Ser. A, Vol. 202, pp. 573–589.
Wilkes, MV (1952) “纯编程与应用编程”。《ACM 全国会议论文集》第 2 卷。多伦多,第 121-124 页。
Wilkes, M. V. (1952) “Pure and Applied Programming.” In Proceedings of the ACM National Conference, Vol. 2. Toronto, pp. 121–124.
Wilkes, MV、DJ Wheeler 和 S. Gill。(1951)《电子数字计算机程序的编写,特别参考 EDSAC 和子程序库的使用》。Addison-Wesley,马萨诸塞州雷丁。
Wilkes, M. V., D. J. Wheeler, and S. Gill. (1951) The Preparation of Programs for an Electronic Digital Computer, with Special Reference to the EDSAC and the Use of a Library of Subroutines. Addison-Wesley, Reading, MA.
Wilkes, MV、DJ Wheeler 和 S. Gill。(1957)《电子数字计算机程序的准备》,第二版。Addison-Wesley,马萨诸塞州雷丁。
Wilkes, M. V., D. J. Wheeler, and S. Gill. (1957) The Preparation of Programs for an Electronic Digital Computer, 2e. Addison-Wesley, Reading, MA.
Wilson, PR (2005) “单处理器垃圾收集技术。”可访问http://www.cs.utexas.edu/users/oops/papers.htm#bigsurv。
Wilson, P. R. (2005) “Uniprocessor Garbage Collection Techniques.” Available at http://www.cs.utexas.edu/users/oops/papers.htm#bigsurv.
Wirth,N.(1971)“编程语言Pascal。” Acta Informatica,第 1 卷,第 1 期,第 35-63 页。
Wirth, N. (1971) “The Programming Language Pascal.” Acta Informatica, Vol. 1, No. 1, pp. 35–63.
Wirth, N. (1973) 系统编程:导论。Prentice Hall,新泽西州恩格尔伍德克利夫斯。
Wirth, N. (1973) Systematic Programming: An Introduction. Prentice Hall, Englewood Cliffs, NJ.
Wirth,N.(1975)“论编程语言的设计”。信息处理 74(IFIP 大会 74 论文集)。北荷兰,阿姆斯特丹,第 386-393 页。
Wirth, N. (1975) “On the Design of Programming Languages.” Information Processing 74 (Proceedings of IFIP Congress 74). North Holland, Amsterdam, pp. 386–393.
Wirth, N. (1977)“Modula:一种模块化多道编程语言。”软件——实践与经验,第 7 卷,第 3-35 页。
Wirth, N. (1977) “Modula: A Language for Modular Multi-Programming.” Software—Practice and Experience, Vol. 7, pp. 3–35.
Wirth, N. 和 CAR Hoare。(1966)“对 ALGOL 发展的贡献”。Commun. ACM,第 9 卷,第 6 期,第 413-431 页。
Wirth, N., and C. A. R. Hoare. (1966) “A Contribution to the Development of ALGOL.” Commun. ACM, Vol. 9, No. 6, pp. 413–431.
Zuse, K. (1972)“Der Plankalkül”。手稿于 1945 年编写,发表于 Berichte der Gesellschaft für Mathematik und Datenverarbeitung,第 63 期(波恩,1972 年);第 3 部分,第 285 页。第 106 期(波恩,1976 年)第 42-244 页,除第 176-196 页外的所有内容的英文翻译。
Zuse, K. (1972) “Der Plankalkül.” Manuscript prepared in 1945, published in Berichte der Gesellschaft für Mathematik und Datenverarbeitung, No. 63 (Bonn, 1972); Part 3, 285 pp. English translation of all but pp. 176–196 in No. 106 (Bonn, 1976), pp. 42–244.
绝对寻址
手册, 201
指针和280
问题, 38,40
Absolute addressing
manual, 201
pointers and, 280
problems with, 38, 40
摘要细胞, 202,284
Abstract cells, 202, 284
抽象类, 489,492,515,517
在 C# 中, 514–515
在 C++ 中, 507
在 Java 中, 510
Abstract class, 489, 492, 515, 517
in C#, 514–515
in C++, 507
in Java, 510
抽象数据类型,19,236–237,312,485–486,549,560
在 Ada, 479
在 C# 中, 461–462
在 C++ 中,453–459、467–468
在 C# 2005 中, 471
设计问题, 452–453
浮点数, 450
Java 中, 459–461
在 Java 5.0 中, 468–470
语言定义, 237
面向对象编程和352
参数化, 466–471
Ruby 463–466 页
对于堆栈, 457,467,476
用户定义, 237, 450–451
Abstract data types, 19, 236–237, 312, 485–486, 549, 560
in Ada, 479
in C#, 461–462
in C++, 453–459, 467–468
in C# 2005, 471
design issues for, 452–453
floating-point as, 450
in Java, 459–461
in Java 5.0, 468–470
language-defined, 237
object-oriented programming and, 352
parameterized, 466–471
in Ruby, 463–466
for stacks, 457, 467, 476
user-defined, 237, 450–451
抽象方法, 489,514
在 C# 中, 514–515
Java 抽象类, 512
Abstract method, 489, 514
in C#, 514–515
of a Java abstract class, 512
抽象, 2,15,19,125,198,357
数据的开始, 70–71
好处, 19,80,225
在 BNF 中, 114
概念, 448–449
数据, 19,80,366,449–452
过程, 366,449
在 Smalltalk 中, 84
子程序, 367
Abstraction, 2, 15, 19, 125, 198, 357
beginnings of data, 70–71
benefits of, 19, 80, 225
in BNF, 114
concept, 448–449
data, 19, 80, 366, 449–452
process, 366, 449
in Smalltalk, 84
subprogram, 367
接受条款正文, 553
Accept clause body, 553
接受条款 , 553–559,561
Accept clauses, 553–559, 561
使用权
深, 437–439
在嵌套子程序中, 376,430
在非阻塞同步中, 569
浅, 439–441
类型, 273
Access
deep, 437–439
in nested subprograms, 376, 430
in nonblocking synchronized, 569
shallow, 439–441
types, 273
ACM(计算机协会), 51
ACM通讯, 53,624
GAMM 和51
格蕾丝·默里·霍普奖, 454,498
图灵奖, 624
ACM (Association for Computing Machinery), 51
Communications of the ACM, 53, 624
GAMM and, 51
Grace Murray Hopper Award, 454, 498
Turing Award of, 624
激活记录实例,420,422,424–435,437–439 ,647–648
静态祖先, 431–435
Activation record instance, 420, 422, 424–435, 437–439, 647–648
of static ancestors, 431–435
激活记录, 420
变量的 local_offset, 426
堆栈中, 430
Activation records, 420
local_offset of a variable in, 426
in stack, 430
活跃子程序
在引用环境中, 224
在堆栈动态局部变量中, 424
Active subprograms
in referencing environments, 224
in stack-dynamic local variables, 424
演员任务, 554,570
Actor tasks, 554, 570
实际参数、203、251、343、369–372、374、378–380、383–385、391、398、410、418、422、424、516、634、638、645、648、661、662、667
Actual parameters, 203, 251, 343, 369–372, 374, 378–380, 383–385, 391, 398, 410, 418, 422, 424, 516, 634, 638, 645, 648, 661, 662, 667
临时绑定, 393–394
Ad hoc binding, 393–394
特设多态性, 399
Ad hoc polymorphism, 399
艾达, 12, 33, 55, 74, 198, 211, 373, 376, 391, 404, 422, 429, 543, 594
2005年版, 82–83
抽象数据类型, 479
赋值语句, 251
属性语法, 130
布尔运算符, 13
编译器, 23
并发性, 552–560
受限匿名类型的声明, 290
派生类型, 290
设计过程, 79-80
评估, 81–82
异常处理, 14
幂运算符, 304
职能, 397
历史背景, 79
实施监测和监测, 550
语言概述, 80–81
包裹数量: 80,560
括号内, 251
传递值结果, 397
子范围类型, 290
亚型, 293,490
任务, 552,560–561,570,575
终止选择构造, 12
类型等价, 288,291
Ada, 12, 33, 55, 74, 198, 211, 373, 376, 391, 404, 422, 429, 543, 594
2005 version of, 82–83
abstract data types in, 479
assignment statement, 251
attribute grammar of, 130
Boolean operator in, 13
compilers, 23
concurrency in, 552–560
declarations of constrained anonymous types, 290
derived types, 290
design process, 79–80
evaluation of, 81–82
exception handling in, 14
exponentiation operator of, 304
functions of, 397
historical background of, 79
implement monitors and monitors of, 550
language overview of, 80–81
packages in, 80, 560
parentheses in, 251
pass-by-value-result of, 397
subrange types, 290
subtypes of, 293, 490
tasks, 552, 560–561, 570, 575
termination of selection construct, 12
type equivalence, 288, 291
阿达83, 82, 550
Ada 83, 82, 550
艾达书95, 82–83 , 543
构建监视器, 550
指针, 396
Ada 95, 82–83, 543
constructing monitors, 550
pointers of, 396
地址、 258、273、277–280、293、341、380、420–422、426、680
数组元素, 293
纪念387
偏移量, 265,280
对于 out-mode 参数, 379
部分, 280
变量, 201–202
Addresses, 258, 273, 277–280, 293, 341, 380, 420–422, 426, 680
of array elements, 293
in memory, 387
offset of, 265, 280
for out-mode parameters, 379
segment of, 280
of variables, 201–202
艾尔·阿霍(Aho, Al) 92 岁
Aho, Al, 92
AI(人工智能), 6,93,688
LISP ,45–46,48,632,653
在 Perl 中, 93
麻省理工学院项目, 46,632
AI (artificial intelligence), 6, 93, 688
LISP in, 45–46, 48, 632, 653
in Perl, 93
Project at MIT, 46, 632
ALGOL 58,113
设计工作, 55
概述, 52
报告, 53
ALGOL 58, 113
design effort, 55
overview of, 52
report on, 53
ALGOL 60, 51, 57, 61, 62, 67
BNF 中, 55
设计过程, 53-54
评估, 54–56
例如, 55–56
概述, 54
原发性缺陷, 71
ALGOL 60, 51, 57, 61, 62, 67
BNF in, 55
design process, 53–54
evaluation of, 54–56
example of an, 55–56
overview of, 54
primary deficiency of, 71
ALGOL 68
设计流程, 71
评价, 72–73
语言概述, 72
正交性, 72
ALGOL 68
design process, 71
evaluation of, 72–73
language overview of, 72
orthogonality in, 72
ALGOL 公报, 53
ALGOL Bulletin, 53
别名, 201–202,276,279,380–381,385,392
Aliases, 201–202, 276, 279, 380–381, 385, 392
混叠, 14–15,201–202,380–381,391–392,396–397,410
Aliasing, 14–15, 201–202, 380–381, 391–392, 396–397, 410
分配、 46、69、72、207–209、217、237、245–247、252–253、275、281–282、375、421
物体, 492–493
存储, 46, 69, 208, 217, 246–247 , 252, 283
Allocation, 46, 69, 72, 207–209, 217, 237, 245–247, 252–253, 275, 281–282, 375, 421
of objects, 492–493
storage, 46, 69, 208, 217, 246–247, 252, 283
歧义语法, 118–119,333
Ambiguous grammars, 118–119, 333
AND 运算符, 153, 155
AND operator, 153, 155
and then布尔运算符, 13
and then Boolean operator, 13
匿名变量, 273
Anonymous variables, 273
ANSI(美国国家标准协会), 58
在C, 76
最低 BASIC 标准, 62
C++ 标准化, 454,498,594
ANSI (American National Standards Institute), 58
on C, 76
Minimal BASIC standard, 62
standardization of C++, 454, 498, 594
前因, 144,147,683,690–701
Antecedents, 144, 147, 683, 690–701
APES 系统, 710
APES system, 710
APL(一种编程语言), 13,21
起源和特征, 69–70
APL (A Programming Language), 13, 21
origins and characteristics of, 69–70
附加函数, 656
Append function, 656
append运营, 700
append operations, 700
苹果, 87,88
Apple, 87, 88
适用于所有函数形式, 628,649–650
Apply-to-all functional forms, 628, 649–650
算术表达式, 41,58,90,117,165,332,653,711
结合性, 305–307
特征, 302–303
胁迫, 313–315
有条件的, 308–309
设计问题, 303
语法 ,175–176,184,188
在 Lisp 中, 308
混合模式, 313, 314
操作数评估顺序, 309–311
括号内, 307
优先于, 303–305
在 Prolog 中, 711
目的, 303
指称透明度, 310–311
Ruby 307–308
运算符评估顺序规则, 305–307
副作用, 309–311
Arithmetic expressions, 41, 58, 90, 117, 165, 332, 653, 711
associativity in, 305–307
characteristics of, 302–303
coercions in, 313–315
conditional, 308–309
design issues for, 303
grammar for, 175–176, 184, 188
in Lisp, 308
mixed-mode, 313, 314
operand evaluation order in, 309–311
parentheses in, 307
precedence in, 303–305
in Prolog, 711
purpose of, 303
referential transparency in, 310–311
in Ruby, 307–308
rules of operator evaluation order, 305–307
side effects in, 309–311
数组类型, 99,250–261
数组初始化, 254–255
数组操作, 255–256
设计问题, 250
评价, 258
执行情况, 258-260
指数和, 251–252
交错数组, 256
矩形阵列, 256
切片, 257
下标绑定, 252–254
Array types, 99, 250–261
array initialization in, 254–255
array operations in, 255–256
design issues for, 250
evaluation of, 258
implementation of, 258–260
indices and, 251–252
jagged arrays in, 256
rectangular arrays in, 256
slices in, 257
subscript bindings in, 252–254
人工智能(AI)。参见AI(人工智能)
Artificial intelligence (AI). see AI (artificial intelligence)
ASCII(美国信息交换标准代码)、 138、241、689
ASCII (American Standard Code for Information Interchange), 138, 241, 689
程序集, .NET, 474
Assemblies, .NET, 474
断言, 150
在公理语义学中, 143–144
Java 中, 604–605
Assertions, 150
in axiomatic semantics, 143–144
in Java, 604–605
赋值语句, 11,18,20,47,136,150,152,218,251,253,261,277,286
歧义语法, 118
简单属性语法, 131
公理语义学中, 145–147
复合赋值运算符, 320
有条件目标和320
在指称语义学中, 141
作为表达, 322–323
在函数式编程语言中, 323–324
简单语法, 117–118
混合模式, 324
多个, 323
简单, 319–320
一元赋值数据类型, 321
Assignment statements, 11, 18, 20, 47, 136, 150, 152, 218, 251, 253, 261, 277, 286
ambiguous grammar for, 118
attribute grammar for simple, 131
in axiomatic semantics, 145–147
compound assignment operators in, 320
conditional targets and, 320
in denotational semantics, 141
as expressions, 322–323
in functional programming languages, 323–324
grammar for simple, 117–118
mixed-mode, 324
multiple, 323
simple, 319–320
unary assignment data types in, 321
计算机协会 (ACM)。参见ACM (计算机协会)
Association for Computing Machinery (ACM). see ACM (Association for Computing Machinery)
关联数组
定义, 261
执行情况, 262-263
结构和运作, 261–263
Associative arrays
definition, 261
implementation of, 262–263
structure and operations of, 261–263
结合性, 127,176,181,302
运营商, 115,122-123
运算符求值顺序规则, 303,305–307
Associativity, 127, 176, 181, 302
of operators, 115, 122–123
rules of operator evaluation order, 303, 305–307
原子 命题,681–683、685、689–690、698、711
Atomic propositions, 681–683, 685, 689–690, 698, 711
原子, 650
Lisp , 47,629–630
符号谓词函数, 641
Prolog, 689,706,708
Atoms, 650
Lisp, 47, 629–630
predicate functions for symbolic, 641
Prolog, 689, 706, 708
AT&T 贝尔实验室, 455
AT&T Bell Laboratories, 455
属性计算函数, 129
Attribute computation functions, 129
属性语法, 128,292
基本概念, 129
计算属性值, 132–133
定义, 129–130
评估, 133–134
例子, 130–132
内在属性, 130
静态语义和 128–129
Attribute grammars, 128, 292
basic concepts of, 129
computing attribute values in, 132–133
defined, 129–130
evaluation of, 133–134
examples of, 130–132
intrinsic attributes in, 130
static semantics and, 128–129
属性, 171,198,517
装订, 203–204
定义, 129
实例数据为464
内在的, 130
变量、名称、 199、201
Attributes, 171, 198, 517
binding, 203–204
defined, 129
instance data as, 464
intrinsic, 130
of variables, names, 199, 201
自动泛化, 403
Automatic generalization, 403
自动编程, 39
Automatic programming, 39
awk脚本语言, 92
awk scripting language, 92
公理语义学, 680
断言, 143–144
赋值语句, 145–147
评价, 155
逻辑预测试循环, 149–152
程序证明, 152–155
选择, 148–149
序列, 147–148
最弱的先决条件, 144–145
Axiomatic semantics, 680
assertions in, 143–144
assignment statements in, 145–147
evaluation of, 155
logical pretest loops in, 149–152
program proofs in, 152–155
selection in, 148–149
sequences in, 147–148
weakest preconditions in, 144–145
公理, 144,155,681,684
Axioms, 144, 155, 681, 684
B,语言, 75
B, language, 75
查尔斯·巴贝奇, 37,80,366
Babbage, Charles, 37, 80, 366
回溯, 685,694,697,699,704–705,711
Backtracking, 685, 694, 697, 699, 704–705, 711
巴克斯,约翰, 40–41
BNF(巴科斯-诺尔范式) (参见BNF(巴科斯-诺尔范式))
Fortran, 18,41,624
FP(函数式编程), 624–625
快速编码系统, 39
Backus, John, 40–41
BNF (Backus-Naur Form) (see BNF (Backus-Naur Form))
Fortran by, 18, 41, 624
FP (functional programming), 624–625
speedcoding system by, 39
后向链接, 693
Backward chaining, 693
基类, 486
Base class, 486
base前缀, 514
base prefix, 514
基础(初学者通用符号指令代码)
设计过程, 61-62
评估, 62–63
例如, 63
Basic (Beginner’s All-purpose Symbolic Instruction Code)
design process, 61–62
evaluation of, 62–63
example of, 63
基础版, 62
BASIC-PLUS, 62
鲍尔,弗里茨, 51岁
Bauer, Fritz, 51
BCD(二进制编码的十进制), 240
BCD (binary coded decimal), 240
贝尔实验室。 请参阅AT&T 贝尔实验室
Bell Laboratories. see AT&T Bell Laboratories
BINAC 计算机, 38
BINAC computer, 38
二进制编码的十进制 (BCD), 240
binary coded decimal (BCD), 240
二元运算符, 256,303
Binary operators, 256, 303
二进制信号量, 547
Binary semaphore, 547
装订, 632,635,656
实际参数到形式参数, 370
特设, 393–394
属性到变量, 203–204
深,393–394,437
定义, 203
访问之间的差异, 437
动态类型, 205–207
处理程序异常,C++, 595
处理程序异常,Java, 599–600
显式堆动态变量, 209–210
隐式堆动态变量, 210–211
一生, 207
浅,393–394,437
堆栈动态变量, 208–209
静态类型, 314
静态变量, 207–208
静态类型, 493,512
存储, 207,226
下标, 252–254
类型, 204–207
变量, 689
Binding, 632, 635, 656
of actual parameters to formal parameters, 370
ad hoc, 393–394
attributes to variables, 203–204
deep, 393–394, 437
definition, 203
difference between access, 437
dynamic type, 205–207
exceptions to handlers, C++, 595
exceptions to handlers, Java, 599–600
explicit heap-dynamic variables in, 209–210
implicit heap-dynamic variables in, 210–211
lifetime of, 207
shallow, 393–394, 437
stack-dynamic variables in, 208–209
static type, 314
static variables in, 207–208
static-type, 493, 512
storage, 207, 226
subscript, 252–254
type, 204–207
to a variable, 689
绑定时间, 203
Binding time, 203
被阻止的任务, 542
Blocked tasks, 542
区块
Ruby 中, 668
范围, 213–215
Blocks
in Ruby, 668
for scope, 213–215
BNF(巴科斯-诺尔范式), 53–54
描述列表, 115
基本原理, 114–115
起源, 113–114
BNF (Backus-Naur Form), 53–54
describing lists in, 115
fundamentals of, 114–115
origins of, 113–114
Böhm,Corrado, 332,359
Böhm, Corrado, 332, 359
布尔抽象数据类型, 461-462
Boolean abstract data types, 461–462
布尔数据类型, 76,332
Boolean data types, 76, 332
布尔 表达式,316–318,661
Boolean expressions, 316–318, 661
boolean类型变量, 76,90,241,581
boolean type variables, 76, 90, 241, 581
Borland JBuilder, 29
Borland JBuilder, 29
自下而上的解析器, 173–174
LR 解析器和 186–190
解析问题, 184–186
移位-归约算法, 186
Bottom-up parsers, 173–174
LR parsers and, 186–190
parsing problem for, 184–186
shift-reduce algorithms for, 186
自下而上的解决, 693
Bottom-up resolution, 693
约束变量, 207,634
Bound variables, 207, 634
有界通配符类型, 403
Bounded wildcard types, 403
界限, 72,259–260,401
Bounds, 72, 259–260, 401
拳击, 509
Boxing, 509
广度优先搜索, 694
Breadth-first searches, 694
break 声明,338–339,604
多选语句, 338–340
在用户定位循环控制机制中, 350
break statements, 338–339, 604
multiple-selection statements and, 338–340
in user-located loop control mechanisms, 350
布林奇·汉森, 佩尔,548–549、551–552
Brinch Hansen, Per, 548–549, 551–552
内置迭代器, 354
Built-in iterators, 354
商业应用, 6
Business applications, 6
商业记录计算机化。 参见COBOL
Business record computerization. see COBOL
拜伦·奥古斯塔·艾达, 80岁
Byron, Augusta Ada, 80
字节码, 27
Byte code, 27
byte整数, 238
byte integer, 238
byte操作数, 304
byte operands, 304
C, 198
编译器, 23
封装, 472
评价, 76–77
表现力, 13
for声明, 345–347
历史背景, 75–76
语言类别, 20
有限动态字符串, 246
局部变量, 376
混合模式分配, 324
名称和结构类型等价性, 291
正交性, 10
参数, 384
指针, 278
受欢迎程度, 3
便携式系统,一般为 75–77
预处理器指令, 27
规则和例外, 10
static说明符, 208
struct数据类型, 263
switch声明, 337
类型检查, 14
union构造, 271
用户定位循环控制, 350–351
可写性, 13
C, 198
compilers, 23
encapsulation in, 472
evaluation of, 76–77
expressivity in, 13
for statement, 345–347
historical background of, 75–76
language categories in, 20
limited dynamic strings of, 246
local variables in, 376
mixed-mode assignment in, 324
name and structure type equivalence of, 291
orthogonality in, 10
parameters, 384
pointers in, 278
popularity of, 3
portable system of, generally, 75–77
preprocessor instruction, 27
rules and exceptions in, 10
static specifier of, 208
struct data type, 263
switch statement of, 337
type checking in, 14
union constructs in, 271
user-located loop control in, 350–351
writability of, 13
C # 、 164、198、237、238、240、242–244、247–250、253、254、256、257、263、270、371–372、376、411、453、461–462、539、543、549
4.0版本, 580
5.0版本, 99
2010年版本, 206
抽象数据类型,461–462,479
数组 ,253–255,388
集会, 473–474
布尔类型, 241
课程, 479,551
代码段, 349
十进制数据类型, 240
变量声明, 208
设计过程, 98
动态绑定, 514–515
封装构造, 461–462
枚举类型, 247,248
评价 ,99–100,515
事件处理, 613–616
例如, 100
for声明, 347
一般特征, 513
通用集合类, 353
通用库类, 353
goto,第355–356页
堆动态和栈动态对象, 210
推理过程, 667
信息隐藏, 462
继承, 513–514
整数类型, 238
lambda 表达式, 667–668
语言概述, 98–99
List,253
显示字符串的方法, 339
混合模式表达式, 324,398
多项选择结构, 359
姓名形式, 199–200
命名常量, 226
嵌套类, 515
嵌套方法, 407
作为 .NET 语言, 98–100
面向对象编程, 513–515
对象, 253
重载子程序, 398
参数传递方法, 379,384,387–388
指针 ,279–281,395
预定义重载子程序, 398
引用类型, 294
Java 参考文献, 279–280
反思, 526–528
选择语句嵌套, 334
静态语义规则, 339
字符串类, 243
struct数据类型, 263
支持并发, 581
switch 语句, 339
线程, 570–575
var变量声明, 204
变量声明, 215–216
C#, 164, 198, 237, 238, 240, 242–244, 247–250, 253, 254, 256, 257, 263, 270, 371–372, 376, 411, 453, 461–462, 539, 543, 549
4.0 version, 580
5.0 version, 99
2010 version, 206
abstract data types in, 461–462, 479
arrays of, 253–255, 388
assemblies, 473–474
Boolean types in, 241
classes, 479, 551
code segments, 349
decimal data types of, 240
declaration of a variable, 208
design process for, 98
dynamic binding, 514–515
encapsulation constructs in, 461–462
enumeration types in, 247, 248
evaluation of, 99–100, 515
event handling in, 613–616
example of, 100
for statement of, 347
general characteristics, 513
generic collection classes, 353
generic library classes, 353
goto, 355–356
heap-dynamic and stack-dynamic objects in, 210
inferencing process, 667
information hiding in, 462
inheritance in, 513–514
integer types of, 238
lambda expression in, 667–668
language overview of, 98–99
List, 253
method for displaying strings in, 339
mixed-mode expressions in, 324, 398
multiple selection structure, 359
name forms, 199–200
named constants of, 226
nested class, 515
nesting method in, 407
as .NET language, 98–100
object-oriented programming in, 513–515
objects of, 253
overloaded subprograms in, 398
parameter passing methods of, 379, 384, 387–388
pointers of, 279–281, 395
predefined overloaded subprograms of, 398
reference type of, 294
references of Java, 279–280
reflection in, 526–528
selection statement nesting in, 334
static semantics rule of, 339
string classes of, 243
struct data type, 263
support for concurrency, 581
switch statement, 339
threads, 570–575
a var declaration of a variable, 204
variable declarations in, 215–216
C++, 16,20,29,50,55,74,76
抽象数据类型,453–459,467–468
数组, 250,253–254
任务分配声明, 203
布尔类型, 241
班级, 499–509
代码段, 210
恒定参考参数, 389
构造函数, 457
变量声明, 208
声明, 215,217
德尔福, 88
设计过程, 86
析构函数, 457
动态绑定, 504–507
值的动态绑定, 226
封装构造, 456,472–473
枚举类型, 248–249
评价, 87,507–509
异常处理, 14,594–598
for声明, 216,347
形式参数, 370,384
函数, 222
一般特征, 497
全局变量, 217
信息隐藏, 456
继承, 497–504
整数类型, 238
语言概述, 87
有限动态字符串, 246
局部变量, 376
混合模式分配, 324
名字, 199–200
命名空间, 475–476
嵌套选择器, 334
面向对象编程, 496–509
对象, 499
运算符, 311,313
重载子程序, 398
参数化抽象数据类型, 467–468
模式匹配功能, 244
指针, 22,99,202,210,277–282,393–394
引用参数, 384
引用类型, 278
static说明符, 208
struct数据类型输入, 263
switch声明, 337
typedef在, 291
一元运算符, 275
union 构造, 271
用户定位循环控制, 350–351
C++, 16, 20, 29, 50, 55, 74, 76
abstract data types in, 453–459, 467–468
arrays, 250, 253–254
assignment statement of, 203
Boolean types of, 241
classes, 499–509
code segment, 210
constant reference parameters, 389
constructors in, 457
declaration of a variable, 208
declarations in, 215, 217
Delphi, 88
design process for, 86
destructors in, 457
dynamic binding in, 504–507
dynamic binding of values, 226
encapsulation constructs in, 456, 472–473
enumeration types of, 248–249
evaluation of, 87, 507–509
exception handling in, 14, 594–598
for statement of, 216, 347
formal parameters of, 370, 384
functions, 222
general characteristics, 497
global variable of, 217
information hiding in, 456
inheritance in, 497–504
integer types of, 238
language overview of, 87
limited dynamic strings of, 246
local variables in, 376
mixed-mode assignment in, 324
names in, 199–200
namespaces, 475–476
nesting selectors in, 334
object-oriented programming in, 496–509
objects, 499
operators, 311, 313
overloaded subprograms in, 398
parameterized abstract data types, 467–468
pattern-matching capabilities of, 244
pointers in, 22, 99, 202, 210, 277–282, 393–394
reference parameters in, 384
reference types in, 278
static specifier of, 208
struct data type in, 263
switch statement of, 337
typedef in, 291
unary operator of, 275
union constructs in, 271
user-located loop control in, 350–351
C89,76,198,215,241,332,350,386
C89, 76, 198, 215, 241, 332, 350, 386
C99、76、198、199、215、217、241、317、332、345、347、350、386
C99, 76, 198, 199, 215, 217, 241, 317, 332, 345, 347, 350, 386
C# 2005、399、403、411
473–474 年的集会
通用类别, 471
通用函数, 403
命名空间, 475–476
参数化抽象数据类型, 471
C# 2005, 399, 403, 411
assemblies in, 473–474
generic classes in, 471
generic functions in, 403
namespaces in, 475–476
parameterized abstract data types in, 471
调用链, 426
Call chains, 426
呼叫
方法的动态绑定, 519–521
间接, 394–396
子程序的语义, 418
Calls
dynamic binding of method, 519–521
indirect, 394–396
semantics of subprogram, 418
剑桥波兰语, 631
Cambridge Polish, 631
剑桥大学, 40,75
Cambridge University, 40, 75
骆驼符号, 199
Camel notation, 199
Caml, 50,658
Caml, 50, 658
典型 LR算法, 187
canonical LR algorithm, 187
捕获变量, 668
Captured variables, 668
CAR函数、268、639–640、643–646、650、698、702
CAR functions, 268, 639–640, 643–646, 650, 698, 702
案例表达, 338–339
Case expressions, 338–339
区分大小写, 200
Case sensitivity, 200
case声明, 73,316,341
case statements, 73, 316, 341
catch、566–567、 594、599–603、 606、 617
catch, 566–567, 594, 599–603, 606, 617
基于C的语言, 36,198–200,211,213,252,255,305,306,308,314–322,335,345–347,367,430,436,449,589,669
C-based languages, 36, 198–200, 211, 213, 252, 255, 305, 306, 308, 314–322, 335, 345–347, 367, 430, 436, 449, 589, 669
CBL(通用商务语言), 57
CBL (Common Business Language), 57
CDE(Solaris 通用桌面环境), 29
CDE (Solaris Common Desktop Environment), 29
CDR 函数,639–640、643–646、650、698、702
CDR functions, 639–640, 643–646, 650, 698, 702
中央处理器 (CPU) ,17–18,418
Central processing units (CPUs), 17–18, 418
CGI(通用网关接口), 94
CGI (Common Gateway Interface), 94
chain_offset, 431,434,439
chain_offset, 431, 434, 439
钱伯斯,克雷格, 508
Chambers, Craig, 508
char数组, 242,254
char arrays, 242, 254
char类型参数, 400
char type parameters, 400
字符串类型
在 C 和 C++ 中, 254
设计问题, 242
评价, 245
执行情况, 245-247
字符串长度选项, 244–245
字符串操作, 242–244
Character string types
in C and C++, 254
design issues for, 242
evaluation of, 245
implementation of, 245–247
string length options in, 244–245
string operations in, 242–244
字符类型, 241–242
Character types, 241–242
检查异常, 601
Checked exceptions, 601
儿童班, 486
Child class, 486
乔姆斯基,诺姆, 113–114
Chomsky, Noam, 113–114
教堂,阿隆佐, 627
Church, Alonzo, 627
Cii Honeywell/Bull 语言, 80
Cii Honeywell/Bull language, 80
克拉克,KL,689
Clark, K. L., 689
克拉克,路易斯安那州,220
Clarke, L. A., 220
类实例记录 (CIR), 519
Class instance record (CIR), 519
类方法, 487
Class methods, 487
类变量, 487
Class variables, 487
类, 486
摘要, 489
基数, 486
儿童, 486
衍生, 486
例外情况, 599
内部, 512
互锁, 573,582
局部嵌套, 513
父母, 486
超级, 486
包装纸, 90
Classes, 486
abstract, 489
base, 486
child, 486
derived, 486
of exceptions, 599
inner, 512
interlocked, 573, 582
local nested, 513
parent, 486
super, 486
wrapper, 90
小句形式, 683–684
Clausal form, 683–684
客户, 450–453 , 456, 459, 472, 476, 487, 503
Clients, 450–453, 456, 459, 472, 476, 487, 503
Clocksin,WF,705
Clocksin, W. F., 705
CLOS(通用 LISP 对象系统), 653
CLOS (Common LISP Object System), 653
封闭式接受条款, 557
Closed accept clause, 557
封闭世界假设, 706
Closed-world assumption, 706
闭包, 405–407
Closures, 405–407
CML(并行机器学习), 576,592
CML (Concurrent ML), 576, 592
COBOL, 6,23
计算机化商业记录, 56-61
设计过程, 57–58
评价, 58–61
FLOW-MATIC 和57
记录声明的形式, 264
历史背景, 57
COBOL, 6, 23
computerizing business records in, 56–61
design process for, 57–58
evaluation of, 58–61
FLOW-MATIC and, 57
form of a record declaration, 264
historical background of, 57
强制, 90,287,291
在算术表达式中, 313–315
取消程序, 72
Coercions, 90, 287, 291
in arithmetic expressions, 313–315
of deproceduring, 72
Colmerauer,Alain, 77,688
Colmerauer, Alain, 77, 688
列主要顺序, 259
Column major order, 259
通用商业语言 (CBL), 57
Common Business Language (CBL), 57
通用网关接口 (CGI), 94
Common Gateway Interface (CGI), 94
通用中间语言 (CIL), 474,527
Common Intermediate Language (CIL), 474, 527
通用 LISP、49–50、625、651–653
反引号运算符(`),652
列表, 268–269
Common LISP, 49–50, 625, 651–653
backquote operator (`), 652
lists in, 268–269
通用 LISP 对象系统 (CLOS), 20,653
Common LISP Object System (CLOS), 20, 653
通信顺序进程(CSP),356–357,360,555
Communicating Sequential Processes (CSP), 356–357, 360, 555
ACM通讯, 53,624
Communications of the ACM, 53, 624
兼容类型, 286
Compatible types, 286
竞争同步, 539–541
艾达, 557–559
在 Java 中, 564–565
带监视器, 549
需要541
使用信号量, 547–548
Competition synchronization, 539–541
in Ada, 557–559
in Java, 564–565
with monitors, 549
need for, 541
with semaphores, 547–548
编译器设计, 4, 129, 162, 203
基于 BNF, 55
Compiler design, 4, 129, 162, 203
BNF-based, 55
编译器实现, 23
Compiler implementation, 23
复杂数据类型, 240
Complex data types, 240
复合赋值运算符, 320
Compound assignment operators, 320
复合词, 681
Compound terms, 681
计算机体系结构,17–19,69,198,535–537体系
Computer architecture, 17–19, 69, 198, 535–537, 624
并发。
艾达书, 552–560
在 C# 线程中, 570–575
类别, 537–538
在并发机器学习中, 576
语言支持的设计问题, 543–544
显式锁定,Java 5.0,569–570
F# 支持, 577–578
在函数式语言中, 575–578
基本概念, 539–543
高性能 Fortran, 578–580
介绍, 534–539
在 Java 线程中, 560–570
语言设计, 543
消息传递, 551–552
监视器, 549–551
在 Multi-LISP 中, 575
多处理器架构, 535–537
非阻塞同步, 569
受保护的对象, 559–560
使用的原因, 538–539
信号量, 544–548
语句级别, 578–580
子程序级别, 539–544
任务终止, 555,557
线程优先级, 563–564
Concurrency.
in Ada, 552–560
in C# threads, 570–575
categories of, 537–538
in Concurrent ML, 576
design issues for language support for, 543–544
explicit locks in, Java 5.0, 569–570
F# support for, 577–578
in functional languages, 575–578
fundamental concepts of, 539–543
in High-Performance Fortran, 578–580
introduction to, 534–539
in Java threads, 560–570
language design for, 543
message passing in, 551–552
monitors in, 549–551
in Multi-LISP, 575
multiprocessor architectures in, 535–537
nonblocking synchronization in, 569
protected objects in, 559–560
reasons for using, 538–539
semaphores in, 544–548
statement-level, 578–580
subprogram-level, 539–544
task termination, 555, 557
thread priorities in, 563–564
并发机器学习 (CML), 576,592
Concurrent ML (CML), 576, 592
并发 Pascal, 549
Concurrent Pascal, 549
条件表达式, 46,308–309表达式
Conditional expressions, 46, 308–309, 343, 626, 655
有条件目标, 320
Conditional targets, 320
连词, 690
Conjunctions, 690
CONS 函数, 639–640
CONS functions, 639–640
后果, 144,683,690
Consequents, 144, 683, 690
const常数, 226
const constants, 226
构造函数, 453,457
Constructors, 453, 457
上下文无关文法, 113,114,710
Context-free grammars, 113, 114, 710
延续, 596
Continuation, 596
控制表达式, 332
Control expressions, 332
控制流, 537,596,637-638
异常处理, 593
路径, 330
声明, 92
Control flow, 537, 596, 637–638
exception-handling, 593
paths, 330
statements, 92
控制语句, 330
Control statements, 330
控制结构, 2,5,331
Control structures, 2, 5, 331
库珀,艾伦, 64–65
Cooper, Alan, 64–65
库珀,杰克, 80岁
Cooper, Jack, 80
合作同步, 539
在 Ada, 557
爪哇语, 565–568
带监视器, 549–550
使用信号量, 544–547
Cooperation synchronization, 539
in Ada, 557
in Java, 565–568
with monitors, 549–550
with semaphores, 544–547
协程, 71,407–410
Coroutines, 71, 407–410
语言成本, 15-17
Costs of languages, 15–17
反控制循环, 344
在基于 C 的语言中, 345–347
迭代设计问题, 345
在函数式语言中, 348
在 Python 中, 347–348
Counter-controlled loops, 344
in C-based languages, 345–347
design issues for iterative, 345
in functional languages, 348
in Python, 347–348
CPU(中央处理器), 17–18,
CPUs (central processing units), 17–18, 418
CSP(通信顺序进程), 356–357CSP(通信顺序进程
CSP (Communicating Sequential Processes), 356–357, 360, 555
马尔科姆·柯里, 79 岁
Currie, Malcolm, 79
柯里化函数, 658
Curried functions, 658
柯里化, 657
Currying, 657
Cut,Prolog, 704–705
Cut, Prolog, 704–705
奥勒-约翰·达尔, 70–71
Dahl, Ole-Johan, 70–71
悬垂指针, 275–276
Dangling pointers, 275–276
悬垂引用, 275
Dangling references, 275
数据成员, 456
Data members, 456
数据结构, 352–355
Data structures, 352–355
数据类型, 9–11,37,50。
布尔值, 241
人物, 241–242
字符串, 242–247
复杂的, 240
十进制, 240–241
定义, 236
描述符, 237
枚举类型, 247–249
等价性, 288–291
浮点数, 239–240
浮点数作为一种抽象, 450
整数, 238–239
一种语言, 198
629–631 年Lisp 版本
列表, 268–270
数字, 238–241
补码表示法, 239
指针, 273–280
原始, 238–242
记录, 263–266
参考文献, 278–285
字符串长度选项, 244–245
字符串操作, 242–244
就精度和射程而言, 239
理论和, 292–293
元组, 266–267
二进制补码表示法, 239
联盟, 270–272
用户定义, 72, 73, 236
用户定义摘要, 450–451
Data types, 9–11, 37, 50.
Boolean, 241
character, 241–242
character string, 242–247
complex, 240
decimal, 240–241
definition, 236
descriptors, 237
enumeration types, 247–249
equivalence in, 288–291
floating-point, 239–240
floating-point as an abstract, 450
integer, 238–239
of a language, 198
in Lisp, 629–631
lists, 268–270
numeric, 238–241
ones-complement notation, 239
pointer, 273–280
primitive, 238–242
record, 263–266
reference, 278–285
string length options in, 244–245
string operations in, 242–244
in terms of precision and range, 239
theory and, 292–293
tuple, 266–267
twos-complement notation, 239
union, 270–272
user-defined, 72, 73, 236
user-defined abstract, 450–451
基于数据的迭代器, 360
Data-based iterators, 360
死亡任务, 542
Dead task, 542
死锁, 543
Deadlocks, 543
解除分配, 207,492-493
Deallocation, 207, 492–493
十进制数据类型, 240
Decimal data types, 240
宣言顺序, 215–216
Declaration order, 215–216
声明性语言, 680,686-687
Declarative languages, 680, 686–687
装饰解析树, 132
Decorating parse trees, 132
减量字段, 573
Decrement fields, 573
深度访问, 437–439
Deep access, 437–439
深度绑定, 393
Deep binding, 393
延迟引用计数, 282
Deferred reference counting, 282
定义
记录, 264
在 Scheme 程序中, 634–636
在子程序中, 367–368
Definitions
of records, 264
in Scheme program, 634–636
in subprograms, 367–368
代表, 395–396
Delegates, 395–396
delete操作员, 456,457
在关联数组中, 261
C++, 210,253,276,499
显式释放使用, 497
delete operator, 456, 457
in associative arrays, 261
C++, 210, 253, 276, 499
explicit deallocation using, 497
德尔福, 88,98,462
Delphi, 88, 98, 462
指称语义学, 137–142,628,669
赋值语句, 141
评价, 142
例子, 138–139
表达, 140–141
逻辑预测试循环, 141–142
计划状况, 140
Denotational semantics, 137–142, 628, 669
assignment statements in, 141
evaluation of, 142
examples of, 138–139
expressions in, 140–141
logical pretest loops in, 141–142
state of programs and, 140
国防部(DoD), 57,58,79-80
Department of Defense (DoD), 57, 58, 79–80
家属, 54,55,79–83
Dependents, 54, 55, 79–83
DEPOSIT 子程序, 545
DEPOSIT subprogram, 545
深度优先搜索, 694
Depth-first searches, 694
取消引用指针, 277
Dereferencing pointers, 277
衍生品, 115–117
Derivations, 115–117
派生类, 486,500-501
Derived classes, 486, 500–501
派生类型, 289
Derived types, 289
描述符, 237
Descriptors, 237
设计问题
对于抽象数据类型, 452–453
对于算术表达式, 303
对于数组类型, 250
对于字符串类型, 242
对于枚举类型, 247
异常处理, 591–594
对于函数, 396–397
对于迭代计数器控制语句, 345
对于并发性的语言支持 ,543–544,580
用于逻辑控制循环, 348–349
对于多个选择器, 336–337
姓名, 199
对于面向对象语言, 489–494
特别是指针, 274
具体到记录, 264
对于子程序, 374
权衡, 21-22
对于双向选择器, 332
对于联合类型, 271
Design issues
for abstract data types, 452–453
for arithmetic expressions, 303
for array types, 250
for character string types, 242
for enumeration types, 247
for exception handling, 591–594
for functions, 396–397
for iterative counter-controlled statements, 345
for language support for concurrency, 543–544, 580
for logically controlled loop, 348–349
for multiple selectors, 336–337
for names, 199
for object-oriented languages, 489–494
particular to pointers, 274
specific to records, 264
for subprograms, 374
trade-offs, 21–22
for two-way selectors, 332
for union types, 271
析构函数, 457
Destructors, 457
钻石继承, 491
Diamond inheritance, 491
词典, 97,262
Dictionaries, 97, 262
迪克斯特拉,埃兹格, 356
保护命令 ,356–359,551
在 PL/I 上, 68
信号量, 544
关于同步操作, 549
Dijkstra, Edsger, 356
guarded commands by, 356–359, 551
on PL/I, 68
semaphores by, 544
on synchronization operations, 549
直接左递归, 180
Direct left recursion, 180
歧视工会, 271
Discriminated unions, 271
不相交的任务, 539
Disjoint tasks, 539
dispose,281
dispose, 281
DLL(动态链接库), 65,474
DLLs (dynamic link libraries), 65, 474
DO CONCURRENT 构造, 43
DO CONCURRENT constructs, 43
DoD(国防部), 57,58
DoD (Department of Defense), 57, 58
域集, 625
Domain set, 625
点符号, 249,264
Dot notation, 249, 264
双精度浮点数据类型, 239
Double floating-point data types, 239
do-while声明, 350
do-while statements, 350
Dynabook, 83岁
Dynabook, 83
动态绑定, 205,488-489
在 Ada, 82
在 C# 中, 514–515
在 C++ 中, 87、226、504–507
在 Java 中, 512
消息到方法, 493,495
方法调用的方法, 484,519–521
面向对象编程中 ,488–489,493
Ruby 中, 517
Smalltalk 中, 495–496
子程序调用, 82
Dynamic binding, 205, 488–489
in Ada, 82
in C#, 514–515
in C++, 87, 226, 504–507
in Java, 512
of messages to methods, 493, 495
of method calls to methods, 484, 519–521
in object-oriented programming, 488–489, 493
in Ruby, 517
in Smalltalk, 495–496
of subprogram calls, 82
动态链, 426
Dynamic chains, 426
动态调度, 488
Dynamic dispatch, 488
动态语言, 69–70
Dynamic languages, 69–70
动态长度字符串, 245
Dynamic length strings, 245
动态链接库 (DLL), 65,474
Dynamic link libraries (DLLs), 65, 474
动态链接, 422–423
Dynamic links, 422–423
动态作用域, 220–222,437–441,632
Dynamic scoping, 220–222, 437–441, 632
动态语义, 129
公理语义学, 142–155
指称语义, 137–142
操作语义, 134–137
Dynamic semantics, 129
axiomatic semantics as, 142–155
denotational semantics as, 137–142
operational semantics as, 134–137
动态类型绑定, 205–207
Dynamic type binding, 205–207
动态类型检查, 286
Dynamic type checking, 286
热切的方法, 282
Eager approach, 282
EBNF(扩展的 BNF), 125–127
EBNF (Extended BNF), 125–127
ECMA(欧洲计算机制造商协会), 94
ECMA (European Computer Manufacturers Association), 94
爱丁堡语法, 689
Edinburgh syntax, 689
爱德华兹,丹尼尔J.,632
Edwards, Daniel J., 632
艾希,布伦丹, 94岁
Eich, Brendan, 94
详述, 208
Elaboration, 208
元素运算符, Fortran 95+,198
Elemental operators, Fortran 95+, 198
省略引用, 265
Elliptical references, 265
else-if条款, 341
else-if clause, 341
封装结构
在C, 472
在 C# 中,461–462、473–474
在C++中, 456、472–473、475–476
介绍, 471–472
Java 中, 476–477
命名, 474–478
Ruby , 463,477–478
Encapsulation constructs
in C, 472
in C#, 461–462, 473–474
in C++, 456, 472–473, 475–476
introduction to, 471–472
in Java, 476–477
naming, 474–478
in Ruby, 463, 477–478
entry条款, 553
entry clauses, 553
枚举常量, 247
Enumeration constants, 247
枚举类型, 247–250,337
在 C# 中, 249
在 C++ 中, 248–249
设计问题, 247–248
设计, 247–249
评价, 249–250
调于 F#, 249
在 Java 5.0 中,249
在 ML, 249
Enumeration types, 247–250, 337
in C#, 249
in C++, 248–249
design issues for, 247–248
designs, 247–249
evaluation of, 249–250
in F#, 249
in Java 5.0, 249
in ML, 249
环境指针(EP), 418
Environment pointers (EPs), 418
子程序链接的结尾, 419
Epilogue of subprogram linkage, 419
EP(环境指针), 418
EPs (Environment pointers), 418
EQ? 函数, 641
EQ? function, 641
等价性, 288–291
Equivalence, 288–291
擦除规则, 181
Erasure rule, 181
错误
在算术表达式中, 315
Errors
in arithmetic expressions, 315
欧洲计算机制造商协会(ECMA), 94
European Computer Manufacturers Association (ECMA), 94
EVAL函数, 632、635–636、650–651
EVAL functions, 632, 635–636, 650–651
评估环境, 653
Evaluation environments, 653
事件处理
在 C# 中, 613–616
介绍, 608–609
爪哇语, 609–613
Event handling
in C#, 613–616
introduction to, 608–609
in Java, 609–613
事件监听器, 610–611
Event listeners, 610–611
事件, 591–592,608–610
Events, 591–592, 608–610
异常处理
在 Ada, 14
基本概念, 589–591
在 C++ 中, 14,594–598
设计问题, 591–594
介绍, 588–594
爪哇语, 598–605
在 Python 中, 605–607
Ruby 607–608
Exception handling
in Ada, 14
basic concepts of, 589–591
in C++, 14, 594–598
design issues for, 591–594
introduction to, 588–594
in Java, 598–605
in Python, 605–607
in Ruby, 607–608
例外情况, 315
Exceptions, 315
对象的排他性, 489–490
Exclusivity of objects, 489–490
可执行映像, 25
Executable images, 25
执行效率, 670
Execution efficiency, 670
专家系统, 709–710
Expert systems, 709–710
显式声明, 204
Explicit declarations, 204
显式堆动态变量, 209–210
Explicit heap-dynamic variables, 209–210
显式锁定,Java 5.0,569–570
Explicit locks in, Java 5.0, 569–570
显式类型转换, 315
Explicit type conversions, 315
表达式
赋值语句, 322–323
布尔, 143,316–318,661
在 C# 中, 324,398
案例, 338–339
胁迫, 313–315
有条件的, 46,308–309,343,626,655
控制, 332
在指称语义学中, 140–141
错误, 315
混合模式, 313
在递归下降解析器中, 175–180
关系, 316
短路评估, 318–319
明确的语法, 120
Expressions
assignment statements as, 322–323
Boolean, 143, 316–318, 661
in C#, 324, 398
case, 338–339
coercion in, 313–315
conditional, 46, 308–309, 343, 626, 655
control, 332
in denotational semantics, 140–141
errors in, 315
mixed-mode, 313
in recursive-descent parsers, 175–180
relational, 316
short-circuit evaluation in, 318–319
unambiguous grammar for, 120
表现力, 13
Expressivity, 13
扩展接受条款, 556
Extended accept clause, 556
扩展的 BNF(EBNF), 125–127
Extended BNF (EBNF), 125–127
可扩展样式表语言转换 (XSLT), 21
eXtensible Stylesheet Language Transformations (XSLT), 21
extern限定词, 217
extern qualifiers, 217
F #、29、403–404、625、663–666
通用函数, 403–404
通用库类, 353
支持并发, 577–578
F#, 29, 403–404, 625, 663–666
generic functions in, 403–404
generic library classes, 353
support for concurrency, 577–578
事实陈述, 689–690
Fact statements, 689–690
法伯,法学博士,70岁
Farber, J. D., 70
胖吧, 357
Fatbars, 357
特征多样性, 8
Feature multiplicity, 8
FETCH 子程序, 545
FETCH subprogram, 545
取指执行周期, 18, 26
Fetch-execute cycle, 18, 26
FGCS(第五代计算系统), 688
FGCS (Fifth Generation Computing Systems), 688
斐波那契数, 659
Fibonacci number, 659
菲尔兹, 264–265
Fields, 264–265
第五代计算系统(FGCS), 688
Fifth Generation Computing Systems (FGCS), 688
过滤器, 656
Filter, 656
最终确定, 593
Finalization, 593
finalize方法, 509
finalize methods, 509
finally子句, 603–604
finally clauses, 603–604
FindAll 方法, 667
FindAll method, 667
有限自动机, 165
Finite automata, 165
有限映射, 293
Finite mappings, 293
坚决强制, 72
Firm coercion, 72
一阶谓词演算, 681
First-order predicate calculus, 681
固定堆动态数组, 252–253
Fixed heap-dynamic arrays, 252–253
固定堆栈动态数组, 252–253
Fixed stack-dynamic arrays, 252–253
flex数组, 72
flex arrays, 72
float、 14、 90、 205、286–289、 313、 324、 387、 394
在C, 472
在 C# 中, 395、461、572
在 C++ 中, 595
胁迫, 90
强类型, 287
类型检查, 14,286–287
类型转换中, 313–315
float, 14, 90, 205, 286–289, 313, 324, 387, 394
in C, 472
in C#, 395, 461, 572
in C++, 595
coercions, 90
in strong typing, 287
in type checking, 14, 286–287
in type conversions, 313–315
float变量, 287
float variable, 287
浮点数据类型, 239,240,450
Floating-point data types, 239, 240, 450
浮点运算, 37,39–40,75,287,306
Floating-point operations, 37, 39–40, 75, 287, 306
FLOW-MATIC, 57
FLOW-MATIC, 57
FLPL(Fortran 列表处理语言), 46
FLPL (Fortran List Processing Language), 46
弗林,迈克尔J.,536
Flynn, Michael J., 536
for声明, 216
在基于 C 的语言中 ,345–347、352
Java 中, 13,352
在 Python 中, 347–348
for statements, 216
in C-based languages, 345–347, 352
in Java, 13, 352
in Python, 347–348
foreach语句
在 C# 中, 99、254、353
Perl 360
foreach statements
in C#, 99, 254, 353
of Perl, 360
表格, 12
Form, 12
形式参数, 368–370
Formal parameters, 368–370
Fortran, 5、18、61、62、66、198
设计过程, 41
评估, 43–45
指数运算, 306
高性能, 578–580
历史背景 ,40–41,51,251,316,330,624
标签参数, 590
嵌套子程序, 429
阅读声明, 588–589
独立声明, 320
子程序, 419
版本 ,41–43,67,211,263,373,386
Fortran, 5, 18, 61, 62, 66, 198
design process for, 41
evaluation of, 43–45
exponentiation in, 306
High-Performance, 578–580
historical background of, 40–41, 51, 251, 316, 330, 624
label parameters in, 590
nested subprograms in, 429
Read statement, 588–589
stand-alone statement, 320
subprograms in, 419
versions of, 41–43, 67, 211, 263, 373, 386
Fortran 列表处理语言 (FLPL), 46
Fortran List Processing Language (FLPL), 46
正向链接, 693
Forward chaining, 693
FP(函数式编程),45–50,623–671
FP (functional programming), 45–50, 623–671
自由软件组织, 689
Free Software Organization, 689
自由联盟, 271
Free unions, 271
完全属性解析树, 130
Fully attributed parse trees, 130
完全限定引用, 265
Fully qualified references, 265
功能组合物, 648–649
Functional compositions, 648–649
Scheme 中的函数组合 , 639–640
Functional compositions in Scheme, 639–640
功能 形式,627–628,648–650
Functional forms, 627–628, 648–650
函数式编程(FP),45–50,623–671
Functional programming (FP), 45–50, 623–671
函数式编程语言, 294,310,330,369,406
赋值语句, 323–324
Common Lisp, 651–653
并发性, 575–578
并发机器学习 (CML), 576
F #,577–578,663–666
函数形式, 627–628
基本原理, 628–629
哈斯克尔 (658–663)
命令式语言支持, 666–669
命令式语言vs., 669–671
引言, 624–625
LISP, 629–632
数学函数, 625–628
多 LISP(ML), 575,653–658
方案, 633–651
简单函数, 626–627
Functional programming languages, 294, 310, 330, 369, 406
assignment statements in, 323–324
Common Lisp, 651–653
concurrency in, 575–578
Concurrent ML (CML), 576
F#, 577–578, 663–666
functional forms in, 627–628
fundamentals of, 628–629
Haskell, 658–663
imperative languages supporting, 666–669
imperative languages vs., 669–671
introduction, 624–625
LISP, 629–632
mathematical functions in, 625–628
Multi-LISP (ML), 575, 653–658
Scheme, 633–651
simple functions in, 626–627
功能
艾达, 397
属性计算, 129
C++ 222
C# 2005,通用, 403
C#,通用, 399–401
中非共和国, 268,639–640,643–646,650,698,702
CDR ,639–640,643–646,650,698,702
构图, 627
反对派, 639–640
咖喱, 658
设计问题, 396–397
评估, 650
F#,通用, 403–404
Java 5.0,通用, 401–403
JavaScript, 667
数学、函数式编程语言, 625–628
刊于《Scheme》, 634–636页
作为子程序, 373
Functions
of Ada, 397
attribute computation, 129
of C++, 222
of C# 2005, generic, 403
of C#, generic, 399–401
CAR, 268, 639–640, 643–646, 650, 698, 702
CDR, 639–640, 643–646, 650, 698, 702
composition, 627
CONS, 639–640
curried, 658
design issues for, 396–397
EVAL, 650
of F#, generic, 403–404
of Java 5.0, generic, 401–403
of JavaScript, 667
mathematical, functional programming languages, 625–628
in Scheme, 634–636
as subprograms, 373
函子, 695
Functors, 695
future构造, 575
future constructs, 575
GAMM(德国应用数学和力学学会), 51
GAMM (German Society for Applied Mathematics and Mechanics), 51
垃圾收集, 97
Garbage collection, 97
比尔·盖茨65 岁
Gates, Bill, 65
语言谱系, 35
Genealogy of languages, 35
一般性, 16
Generality, 16
生成并测试, 705
Generate and test, 705
世代, 112
Generation, 112
发电机, 112–113,660
Generators, 112–113, 660
通用子程序
在 C++ 中, 399–401
在 C# 2005 中,403
调于 F#, 403–404
在 Java 5.0 中,401–403
Generic subprograms
in C++, 399–401
in C# 2005, 403
in F#, 403–404
in Java 5.0, 401–403
德国应用数学和力学学会(GAMM), 51
German Society for Applied Mathematics and Mechanics (GAMM), 51
getPriority方法, 563
getPriority methods, 563
Getter方法, 516
Getter methods, 516
Glennie,Alick E.,40–41
Glennie, Alick E., 40–41
全球范围, 217–219
Global scope, 217–219
GNOME, 29
GNOME, 29
Go,85
Go, 85
目标, 691–692
Goals, 691–692
谷歌, 455,537
Google, 455, 537
詹姆斯·戈斯林89 岁
Gosling, James, 89
转到, 188–190
GOTO, 188–190
语法。
模棱两可,118–119,333
上下文无关, 113,114,710
衍生品, 115–117
LL 语法课, 180–183
识别器和 127–128
对于简单的赋值语句, 117
对于小语言来说, 116
明确,120–122,124–125
范维恩加登, 72岁
Grammars.
ambiguous, 118–119, 333
context-free, 113, 114, 710
derivations and, 115–117
LL grammar class, 180–183
recognizers and, 127–128
for simple assignment statements, 117
for a small language, 116
unambiguous, 120–122, 124–125
van Wijngaarden, 72
格里斯沃尔德,RE,70
Griswold, R.E., 70
受保护的命令, 356–359,551
Guarded commands, 356–359, 551
卫兵, 544,559,659
Guards, 544, 559, 659
GUI(图形用户界面), 13,608–609
C#, 614
爪哇, 609–610
UNIX 和29
使用 Windows 窗体, 614, 617
VB, 63
GUIs (graphical user interfaces), 13, 608–609
C#, 614
Java, 609–610
UNIX and, 29
using Windows Forms, 614, 617
VB, 63
哈蒙德,P.,710
Hammond, P., 710
手柄, 185–187
Handles, 185–187
汉森,布林奇, 551
Hansen, Brinch, 551
哈比森,塞缪尔·P.,338
Harbison, Samuel P., 338
哈希, 93,96,261,353,360,471
Hashes, 93, 96, 261, 353, 360, 471
Haskell , 369,625,658–663
Haskell, 369, 625, 658–663
角头条款, 686,690-691
Headed horn clauses, 686, 690–691
头文件, 473
Header files, 473
无头角条款, 686,690-691
Headless horn clauses, 686, 690–691
堆动态数组, 252
Heap-dynamic arrays, 252
堆动态变量 , 209–211,275,280
Heap-dynamic variables, 209–211, 275, 280
堆, 209–210,246
Heaps, 209–210, 246
重量级任务, 539
Heavyweight tasks, 539
安德斯·海尔斯伯格(Anders Hejlsberg) 98 岁
Hejlsberg, Anders, 98
隐藏并发, 537
Hidden concurrency, 537
高阶函数, 627–628
Higher-order functions, 627–628
高阶语言工作组(HOLWG), 79
High-Order Language Working Group (HOLWG), 79
高性能 Fortran (HPF), 578–580
High-Performance Fortran (HPF), 578–580
霍尔(Hoare),加勒比, 71岁
81 岁的 Ada
和 ALGOL 60、73
关于语言设计, 13,21,359
消息传递设计, 551–552
在监视器上, 549
帕斯卡, 73
关于指针, 280
Hoare, C.A.R., 71
on Ada, 81
and ALGOL 60, 73
on language design, 13, 21, 359
message passing design, 551–552
on monitors, 549
Pascal by, 73
on pointers, 280
HOLWG(高阶语言工作组), 79
HOLWG (High-Order Language Working Group), 79
霍珀,格雷斯
以……的名义颁发, 454,498
编译系统, 39
关于编程语言, 57
Hopper, Grace
award in name of, 454, 498
compiling systems by, 39
on programming languages, 57
霍恩条款, 686
Horn clauses, 686
HPF(高性能 Fortran), 578–580
HPF (High-Performance Fortran), 578–580
HTML(超文本标记语言), 406
介绍, 6,21
JavaScript 和 94–95 , 162
JSP 和 101–102
PHP 和96
XML 和101
HTML (HyperText Markup Language), 406
introduction to, 6, 21
JavaScript and, 94–95, 162
JSP and, 101–102
PHP and, 96
XML and, 101
赫斯利实验室, 67
Hursley Laboratory, 67
混合实施系统, 26-27
Hybrid implementation systems, 26–27
超文本标记语言(HTML)。请参阅HTML(超文本标记语言)
HyperText Markup Language (HTML). see HTML (HyperText Markup Language)
假设, 686
Hypotheses, 686
IAL(国际算法语言), 52
IAL (International Algorithmic Language), 52
IBM, 46岁
701电脑, 39
704计算机,40–41,631,639
700 系列机器, 51
COMTRAN, 57岁
Fortran 开发者, 40–45
大型机设计, 9-10
正交性, 10
PL/I 开发, 66–69
分享和53
IBM, 46
701 computer, 39
704 computer, 40–41, 631, 639
700-series machines, 51
COMTRAN, 57
Fortran developed by, 40–45
mainframe design, 9–10
orthogonality and, 10
PL/I developed by, 66–69
SHARE and, 53
“IBM 数学公式翻译系统:FORTRAN”, 41
“The IBM Mathematical FORmula TRANslating System: FORTRAN,” 41
标识符, 111,237
Identifiers, 111, 237
身份操作数, 304
Identity operands, 304
IEEE 浮点标准, 239
格式, 239
IEEE Floating-Point Standard, 239
format, 239
IEEE 浮点标准754, 239
IEEE Floating-Point Standard 754, 239
IF 选择器功能, 637
IF selector function, 637
if语句
任务分配和322
在扩展的 BNF 中, 125
Java, 115,179,334
JSP 和 101–102
在多选语句中, 341–343
嵌套, 339
在嵌套选择器中, 333–336
在递归下降解析器中, 175
规则, 125,175
在选择器表达式中, 336
if statements
assignments and, 322
in Extended BNF, 125
Java, 115, 179, 334
JSP and, 101–102
in multiple-selection statements, 341–343
nested, 339
in nesting selectors, 333–336
in recursive-descent parsers, 175
rule for, 125, 175
in selector expressions, 336
IFIP(国际信息处理联合会), 73
IFIP (International Federation of Information Processing), 73
if-then-else声明, 308,342
if-then-else statements, 308, 342
命令式编程语言, 397,624,626,666–669
支持函数式语言, 666–669
函数式语言vs., 669–671
Imperative programming languages, 397, 624, 626, 666–669
functional languages supporting, 666–669
functional languages vs., 669–671
实现方法
数组类型, 258–260
关联数组, 262–263
字符串类型, 245–247
编译器, 23
混合实施系统, 26-27
即时 (JIT) 实施系统, 27
参数传递方法, 382–383
指针类型, 280–285
记录类型, 265–266
引用类型, 280–285
联合类型, 273
Implementation methods
array types, 258–260
associative arrays, 262–263
character string types, 245–247
of compiler, 23
hybrid implementation systems, 26–27
Just-in-Time (JIT) implementation system, 27
parameter-passing methods, 382–383
pointer types, 280–285
record types, 265–266
reference types, 280–285
union types, 273
隐式声明, 204
Implicit declarations, 204
隐式堆动态变量, 210-211
Implicit heap-dynamic variables, 210–211
隐式锁定, 569–570
Implicit locks in, 569–570
import声明, 477
import declarations, 477
include声明, 538
include statements, 538
增量标记清除垃圾收集, 284
Incremental mark-sweep garbage collection, 284
指标, 72
Indicants, 72
索引, 251–252
Indices, 251–252
推理规则, 692,693,709-710
用于计算循环的前提条件while, 149
一般形式, 144
决议, 684
作为结果规则, 146
对于选择语句, 145,148
在序列中, 147,152
Inference rules, 692, 693, 709–710
for computing the precondition for a while loop, 149
general form of, 144
resolution, 684
as rule of consequence, 146
for selection statements, 145, 148
in sequences, 147, 152
推理过程, 692–695
Inferencing process, 692–695
中缀运算符, 303
Infix operators, 303
信息隐藏
C#, 462
C++, 456
红宝石, 464–465
Information hiding
C#, 462
C++, 456
Ruby, 464–465
信息处理语言(IPL), 45
Information Processing Language (IPL), 45
遗产
C#, 513–514
C++, 497–504
爪哇, 510–512
红宝石, 517
Smalltalk, 495
Inheritance
C#, 513–514
C++, 497–504
Java, 510–512
Ruby, 517
Smalltalk, 495
继承的属性, 129
Inherited attributes, 129
初始值, 226
Initial values, 226
初始化, 254–255
Initialization, 254–255
对象初始化, 494
Initialization of objects, 494
内部类, 512
Inner classes, 512
输入输出模式参数传递, 379
Inout mode parameter passing, 379
实例数据存储, 519
Instance data storage, 519
实例方法, 463,487
Instance methods, 463, 487
实例变量, 463,487
Instance variables, 463, 487
实例化, 689
Instantiation, 689
指令级并发, 535
Instruction-level concurrency, 535
int、 14、90、167–170、247
在C, 203, 213, 254, 314, 472
在 C# 中, 216、395、461
在 C++ 中, 222、226、278、395、398、400、458、468、596、605
在 F# 中, 404
在 Java中, 202、225、286、287、313、468、566–569、572、581
在 ML, 654–655
在 Python 中, 238
类型检查, 286,386
一元减运算符和, 304
int, 14, 90, 167–170, 247
in C, 203, 213, 254, 314, 472
in C#, 216, 395, 461
in C++, 222, 226, 278, 395, 398, 400, 458, 468, 596, 605
in F#, 404
in Java, 202, 225, 286, 287, 313, 468, 566–569, 572, 581
in ML, 654–655
in Python, 238
in type checking, 286, 386
unary minus operator and, 304
int整数, 238
int integer, 238
int类型参数, 400
int type parameters, 400
int变量, 286
int variable, 286
整数, 238–239
byte,238
int,238
long,238
short,238
类型, 238–239
integer, 238–239
byte, 238
int, 238
long, 238
short, 238
types of, 238–239
代祷, 522
Intercession, 522
接口抽象类, 510
Interface abstract class, 510
互锁类, 573,582
Interlocked classes, 573, 582
国际算法语言(IAL), 52
International Algorithmic Language (IAL), 52
国际信息处理联合会(IFIP), 73
International Federation of Information Processing (IFIP), 73
国际标准化组织(ISO), 94,241
International Standards Organization (ISO), 94, 241
翻译者, 631–632
Interpreter, 631–632
内在属性, 130
Intrinsic attributes, 130
内在条件队列, 565
Intrinsic condition queue, 565
内在限制, 708
Intrinsic limitations, 708
IPL(信息处理语言), 45
IPL (Information Processing Language), 45
is运营商, 695
is operators, 695
ISO(国际标准化组织), 94,241
ISO (International Standards Organization), 94, 241
迭代语句, 343–355
反控制循环和 344–348
数据结构, 352–355
设计问题, 345,348–349
例子, 349–350
for声明, 345–348
逻辑控制循环和 348–350
用户定位循环控制, 350–351
Iterative statements, 343–355
counter-controlled loops and, 344–348
data structures for, 352–355
design issues for, 345, 348–349
examples, 349–350
for statements, 345–348
logically controlled loops and, 348–350
user-located loop controls as, 350–351
艾弗森,肯尼斯·P.,69岁
Iverson, Kenneth P., 69
朱塞佩·贾科皮尼, 330, 332, 359
Jacopini, Giuseppe, 330, 332, 359
交错数组, 256
Jagged arrays, 256
JAR(Java 档案), 474
JARs (Java Archives), 474
Java ,509–513,539,543
5.0,通用函数, 401–403
5.0,参数化抽象数据类型, 468–470
抽象数据类型,459–461,468–470
断言, 604–605
将异常绑定到处理程序, 599–600
异常类别, 599–600
竞争同步, 564–565
Java 线程中的并发性, 560–570
合作同步, 565–568
设计选择, 600–602
设计过程, 89
动态绑定, 226,512
评价 ,90–92,461,513,570,605
事件处理, 609–613
事件模型, 610–613
异常处理程序, 599–600
异常处理, 598–605
明确锁定, 569–570
表现力, 13
特征多样性, 8
finally条款, 603-604
for声明, 216,347
一般特征, 509
基于命令式的面向对象, 89–92
继承, 510–512
整数类型, 238
语言概述, 89–90
混合模式分配, 324
姓名, 199
嵌套类, 512–513
嵌套选择器, 334
非阻塞同步, 569
对象, 509
重载子程序, 398
包, 476–477
参数化抽象数据类型, 468–470
参数, 384
模式匹配功能, 244
受欢迎程度, 3
原始标量类型和类, 528
线程的优先级, 563–564
反思, 523–525
信号量输入, 564
Swing GUI 组件, 609–610
switch声明, 337
Thread班级, 561–563
用户定位循环控制, 350
while and do声明, 350
Java, 509–513, 539, 543
5.0, generic functions in, 401–403
5.0, parameterized abstract data types, 468–470
abstract data types, 459–461, 468–470
assertions in, 604–605
binding exceptions to handlers, 599–600
classes of exceptions, 599–600
competition synchronization in, 564–565
concurrency in Java threads, 560–570
cooperation synchronization in, 565–568
design choices, 600–602
design process, 89
dynamic binding in, 226, 512
evaluation of, 90–92, 461, 513, 570, 605
event handling with, 609–613
event model, 610–613
exception handlers of, 599–600
exception handling in, 598–605
explicit locks in, 569–570
expressivity in, 13
feature multiplicity in, 8
finally clauses, 603–604
for statements of, 216, 347
general characteristics, 509
imperative-based object-orientation of, 89–92
inheritance in, 510–512
integer types of, 238
language overview of, 89–90
mixed-mode assignment in, 324
names in, 199
nested classes, 512–513
nesting selectors in, 334
nonblocking synchronization in, 569
objects of, 509
overloaded subprograms in, 398
packages, 476–477
parameterized abstract data types in, 468–470
parameters, 384
pattern-matching capabilities of, 244
popularity of, 3
primitive scalar types and classes of, 528
priorities of threads, 563–564
reflection in, 523–525
semaphores in, 564
Swing GUI components, 609–610
switch statement of, 337
Thread class, 561–563
user-located loop control in, 350
while and do statements, 350
Java 档案 (JAR), 474
Java Archives (JARs), 474
Java 服务器页面标准标记库 (JSTL)、 21、101
Java Server Pages Standard Tag Library (JSTL), 21, 101
Java 虚拟机, 27
Java Virtual Machine, 27
JavaScript、 6、20、26、29、609、670
匿名函数, 667
动态类型绑定, 205–206
函数, 667
起源和特点, 94–96
JavaScript, 6, 20, 26, 29, 609, 670
anonymous function in, 667
dynamic type binding in, 205–206
functions for, 667
origins and characteristics of, 94–96
join方法, 561–562
join methods, 561–562
快活, 53岁
JOVIAL, 53
JSP, 100–102
JSP, 100–102
JSTL(Java 服务器页面标准标记库), 21,101
JSTL (Java Server Pages Standard Tag Library), 21, 101
即时 (JIT) 编译器, 91、98、163
Just-in-Time (JIT) compilers, 91, 98, 163
即时 (JIT) 实施系统, 27
Just-in-Time (JIT) implementation system, 27
凯,艾伦, 83–84
Kay, Alan, 83–84
凯梅尼,约翰, 61–62
Kemeny, John, 61–62
Kernighan,Brian, 92,356
Kernighan, Brian, 92, 356
钥匙, 261
Keys, 261
关键字参数, 370
Keyword parameters, 370
关键词, 370
Keywords, 370
唐纳德·克努斯 40, 55, 103, 187, 356
Knuth, Donald, 40, 55, 103, 187, 356
科恩,大卫, 92岁
Korn, David, 92
罗伯特·科瓦尔斯基
基于逻辑的语义网络, 710
Prolog 77, 688
Kowalski, Robert
on logic-based semantic networks, 710
Prolog by, 77, 688
库尔茨,托马斯, 61岁
Kurtz, Thomas, 61
Lambda 演算, 627
Lambda calculus, 627
Lambda 表达式, 50,91,627,635
在 C# 中, 667–668
在 Java 8 中,668
在 Python 中, 668
在 Scheme 中, 635
Lambda expressions, 50, 91, 627, 635
in C#, 667–668
in Java 8, 668
in Python, 668
in Scheme, 635
语言设计
艾达(Ada) 80 岁
ALGOL 58, 52–53
ALGOL 60, 53–56
ALGOL 68, 72
基础版, 62
C#, 98–99
C++, 87
类别, 20–21
COBOL, 56-61
计算机架构, 17-19
并发, 543
早期设计过程, 51
对于 Fortran, 43
Hoare 的观察, 13、21、359
混合实施系统, 26
影响, 17–20
Java, 89–90
PL/I, 67,68
Prolog 程序, 77–78
模拟67, 71
Smalltalk, 84
权衡, 21-22
Language design
Ada, 80
ALGOL 58, 52–53
ALGOL 60, 53–56
ALGOL 68, 72
BASIC, 62
C#, 98–99
C++, 87
categories in, 20–21
COBOL, 56–61
computer architecture, 17–19
concurrency, 543
early design process, 51
for Fortran, 43
Hoare’s observation, 13, 21, 359
hybrid implementation system, 26
influences on, 17–20
Java, 89–90
PL/I, 67, 68
Prolog programs, 77–78
SIMULA 67, 71
Smalltalk, 84
trade-offs, 21–22
语言生成器, 112–113
Language generators, 112–113
语言识别器, 112
Language recognizers, 112
Laning 和 Zierler 系统, 41
Laning and Zierler system, 41
拉特纳,C.,87
Lattner, C., 87
懒惰的方法, 282
Lazy approach, 282
惰性求值, 661–663
Lazy evaluation, 661–663
LCF(可计算函数逻辑), 50
LCF (Logic for Computable Functions), 50
学习新语言, 2
Learning new languages, 2
左分解, 183
Left factoring, 183
左递归文法规则, 123
Left recursive grammar rules, 123
左侧 ( LHS )、 114–115、123、138、173–174、181、184、186、188、190、192、207
语法规则, 115,123,173
Left-hand side (LHS), 114–115, 123, 138, 173–174, 181, 184, 186, 188, 190, 192, 207
grammar rules for, 115, 123, 173
最左推导, 116
Leftmost derivations, 116
拉斯穆斯·勒多夫(Rasmus Lerdorf) 96 岁
Lerdorf, Rasmus, 96
let
调于 F#, 664
在哈斯克尔, 660
在 ML, 214,656
Scheme , 214,646–647
范围, 215
let
in F#, 664
in Haskell, 660
in ML, 214, 656
in Scheme, 214, 646–647
scope of, 215
级别数, 264
Level numbers, 264
词素, 111,164
Lexemes, 111, 164
词汇分析, 163–171
词法分析器, 164–165
进程, 164
Lexical analysis, 163–171
lexical analyzer, 164–165
process, 164
生前, 207–211
Lifetime, 207–211
轻量级任务, 539
Lightweight task, 539
有限动态长度字符串, 245
Limited dynamic length strings, 245
连接器, 25,420
Linkers, 25, 420
链接, 25
Linking, 25
链接和加载, 25
Linking and loading, 25
LISP, 205,220,222,237。
分配和释放, 281
人工智能和 45–46
常见, 651–653
数据结构, 47,48,629–631
数据类型, 629–631
后代, 49–50
设计目标, 282
设计过程, 46
评估, 48–49
表达, 308
函数式编程, 47
实施639
翻译, 631–632
相关语言, 50
列表处理和 45–46
反思, 527
单一大小分配堆, 281–282
语法, 48
LISP, 205, 220, 222, 237.
allocation and deallocation in, 281
artificial intelligence and, 45–46
common, 651–653
data structures in, 47, 48, 629–631
data types in, 629–631
descendants of, 49–50
design goals of, 282
design process for, 46
evaluation of, 48–49
expressions in, 308
functional programming in, 47
implementation of, 639
interpreter in, 631–632
languages related to, 50
list processing and, 45–46
reflections in, 527
single-size allocation heap in, 281–282
syntax of, 48
列表推导, 270
List comprehensions, 270
LIST 函数, Scheme, 268
LIST functions, Scheme, 268
列表, 115
在 Common LISP 中, 268–269
描述, 115
功能, 638–641
在 Multi-LISP (ML) 中, 269
谓词函数, 641–642
在 Prolog 中, 698–703
在 Scheme 语言中, 269
简单, 47,630,643–644,646
类型, 268–270
Lists, 115
in Common LISP, 268–269
descriptions of, 115
functions of, 638–641
in Multi-LISP (ML), 269
predicate functions for, 641–642
in Prolog, 698–703
in Scheme language, 269
simple, 47, 630, 643–644, 646
types of, 268–270
活跃度, 543
Liveness, 543
LiveScript, 94
LiveScript, 94
LL 算法, 173
LL algorithms, 173
LL 语法课, 180–183
LL grammar class, 180–183
加载模块, 25
Load modules, 25
装载机, 420
Loaders, 420
本地嵌套类, 513
Local nested classes, 513
局部引用环境, 375–376
Local referencing environments, 375–376
局部变量, 217–219,376,426
Local variables, 217–219, 376, 426
Local_offset, 426
Local_offset, 426
锁, 569–570
Locks, 569–570
锁和钥匙方法, 281
Locks-and-keys approach, 281
可计算函数逻辑(LCF), 50
Logic for Computable Functions (LCF), 50
逻辑编程语言
应用, 709–710
从句形式, 684,686,691
专家系统和 709–710
自然语言处理, 710
概述, 686–688
谓词演算, 680–684
Prolog, 688–708
关系数据库管理系统和709
决议建设, 684–685
定理证明, 684–686
Logic programming languages
applications of, 709–710
clausal form in, 684, 686, 691
expert systems and, 709–710
natural-language processing, 710
overview of, 686–688
predicate calculus for, 680–684
Prolog, 688–708
relational database management systems and, 709
resolution construction, 684–685
theorem-proving in, 684–686
逻辑并发, 537
Logical concurrency, 537
逻辑控制循环, 348–350
Logically controlled loops, 348–350
long整数, 238
long integer, 238
循环不变量, 149–153
Loop invariants, 149–153
循环参数, 344
Loop parameters, 344
循环变量, 346
Loop variables, 346
循环
在公理语义学中, 149–152
反控制, 344–348
逻辑控制, 348–350
用户定位, 350–351
Loops
in axiomatic semantics, 149–152
counter-controlled, 344–348
logically controlled, 348–350
user-located, 350–351
丢失堆动态变量, 276–277
Lost heap-dynamic variables, 276–277
LR 解析器, 186–190
LR parsers, 186–190
卢阿, 279
数组输入, 254
枚举类型, 249
Lua, 279
arrays in, 254
enumeration types of, 249
L 值, 201
L-value, 201
MAC OS X, 87
MAC OS X, 87
标记-清除垃圾收集, 283
Mark-sweep garbage collection, 283
标记语言,定义, 21
Markup languages, defined, 21
标记编程语言混合语言, 100–102
Markup-programming hybrid languages, 100–102
麻省理工学院(MIT), 41
Massachusetts Institute of Technology (MIT), 41
match表达式, 272
match expressions, 272
匹配子目标, 692
Matching subgoals, 692
匹配类型参数, 595
Matching type parameters, 595
数学函数, 625–628
Mathematical functions, 625–628
松本幸弘97岁
Matsumoto, Yukihiro, 97
莫奇利,约翰, 38岁
Mauchly, John, 38
麦凯布,FG,689
McCabe, F. G., 689
麦卡锡,约翰, 46,629,631–632
McCarthy, John, 46, 629, 631–632
丹尼尔·麦克拉肯, 21岁
McCracken, Daniel, 21
温和胁迫, 72
Meek coercion, 72
Mellish,CS,6,705
Mellish, C. S., 6, 705
成员函数, 456,505
Member functions, 456, 505
存储单元, 198,200,202
Memory cells, 198, 200, 202
内存泄漏, 276–277
Memory leakage, 276–277
消息接口, 486
Message interface, 486
消息协议, 486
Message protocol, 486
消息传递模型, 550
Message-passing model, 550
消息
动态绑定, 493,495
在面向对象语言中, 486
逝世, 486,489,490,498,515,551–552
Messages
binding dynamically, 493, 495
in object-oriented languages, 486
passing of, 486, 489, 490, 498, 515, 551–552
元数据, 522
Metadata, 522
元语言(ML), 50
MetaLanguage (ML), 50
元语言, 114
Metalanguages, 114
元符号, 126
Metasymbols, 126
方法调用, 519–512
Method calls, 519–512
方法, 486
Methods, 486
微软, 65岁
.NET 计算平台, 86, 98, 163
Visual Studio .NET, 29
Microsoft, 65
.NET computing platform, 86, 98, 163
Visual Studio .NET by, 29
米尔纳,罗宾, 50岁
Milner, Robin, 50
MIL-STD 1815, 80
MIL-STD 1815, 80
MIMD(多指令、多数据)计算机, 536
MIMD (Multiple-Instruction, Multiple-Data) computers, 536
明斯基,马文, 46岁
Minsky, Marvin, 46
米兰达, 50岁
Miranda, 50
麻省理工学院 (MIT), 41
人工智能项目, 46
LISP 46
Lisp, 629,633
Scheme 语言, 49
旋风电脑, 41
MIT (Massachusetts Institute of Technology), 41
AI Project, 46
LISP at, 46
Lisp at, 629, 633
Scheme language, 49
Whirlwind computer, 41
混合继承, 511
Mixed inheritance, 511
混合模式赋值语句, 324
Mixed-mode assignment statements, 324
混合模式表达式, 324
Mixed-mode expressions, 324
ML(元语言), 50
ML (MetaLanguage), 50
M 符号, 631
M-notation, 631
模块, 477–478
Modules, 477–478
监视器, 549–551
Monitors, 549–551
MSDOS.exe, 64
MSDOS.exe, 64
多播委托, 396
Multicast delegates, 396
多 LISP(ML), 575,653–658,671
列表, 269
Multi-LISP (ML), 575, 653–658, 671
lists in, 269
多范式编程, 498
Multiparadigm programming, 498
多个赋值语句, 323
Multiple assignment statements, 323
多重继承, 487,491-492
Multiple inheritance, 487, 491–492
多指令多数据(MIMD)计算机, 536
Multiple-Instruction, Multiple-Data (MIMD) computers, 536
多选语句
设计问题, 336–337
例子, 337–340
执行情况, 340-341
使用if,341–343
Multiple-selection statements
design issues for, 336–337
examples of, 337–340
implementation of, 340–341
using if, 341–343
多处理器, 535–537
Multiprocessors, 535–537
多线程程序, 569,574,578
Multithreaded program, 569, 574, 578
名称类型等价性, 288
Name type equivalence, 288
命名常数, 224–226
Named constant, 224–226
名字
在 C# 中, 199
在 C++ 中, 199–200
区分大小写, 200
基于 C 语言的语言, 199–200
设计问题, 199
表格, 199–200
在 Java 中, 199,200
关键词, 200
在 PHP 中, 199
保留字和200
Ruby 中, 199
特殊词语, 200
变量, 199,201
Names
in C#, 199
in C++, 199–200
case sensitive, 200
in C-based languages, 199–200
design issues for, 199
forms, 199–200
in Java, 199, 200
keywords, 200
in PHP, 199
reserved words and, 200
in Ruby, 199
special words, 200
variable, 199, 201
缩小类型转换, 302
Narrowing type conversions, 302
国家物理实验室, 67
National Physical Laboratory, 67
自然操作语义, 135
Natural operational semantics, 135
Naur,Peter, 53,113
Naur, Peter, 53, 113
NCC(挪威计算中心), 70
NCC (Norwegian Computing Center), 70
否定问题,Prolog, 706–708
Negation problem, Prolog, 706–708
嵌套类
在 C# 中, 515
Java 中, 512–513
面向对象编程, 493–494
Nested classes
in C#, 515
in Java, 512–513
object-oriented programming, 493–494
嵌套列表结构, 47,630
Nested list structures, 47, 630
嵌套子程序, 376,429–435
Nested subprograms, 376, 429–435
嵌套类, 494
Nesting classes, 494
嵌套选择器, 333–336
Nesting selectors, 333–336
嵌套深度, 431
nesting_depth, 431
.NET 语言, 27、29、98、353、474、527、574、582、613、663
.NET languages, 27, 29, 98, 353, 474, 527, 574, 582, 613, 663
NetBeans, 29
NetBeans, 29
Netscape, 94
Netscape, 94
诺伊曼,约翰·冯, 17岁
Neumann, John von, 17
new,458,492,509
用于分配堆对象, 275
在 C# 中, 461、513–514
在 C++ 中 ,209–210、253、456
堆管理中, 281
在 Java 中, 469
Ruby 中, 516
new, 458, 492, 509
for allocation of heap objects, 275
in C#, 461, 513–514
in C++, 209–210, 253, 456
in heap management, 281
in Java, 469
in Ruby, 516
新编程语言(NPL), 67
New Programming Language (NPL), 67
纽厄尔·艾伦45 岁
Newell, Allen, 45
下一个迭代器, 352
next iterators, 352
零值, 47,273
Nil values, 47, 273
非阻塞同步, 569
Nonblocking synchronization, 569
nonlocal,219
nonlocal, 219
非严格语言, 661
Nonstrict languages, 661
非终结符号, 114
Nonterminal symbols, 114
挪威计算中心(NCC), 70
Norwegian Computing Center (NCC), 70
NOT 运算符, 316,691,707
NOT operators, 316, 691, 707
NPL(新编程语言), 67
NPL (New Programming Language), 67
空, 642
NULL, 642
数字数据类型, 238
Numeric data types, 238
数字谓词函数, 637
Numeric predicate functions, 637
数字类型
复数, 240
十进制数据类型, 240–241
浮点数据类型, 239–240
整数, 238–239
Numeric type
complex values, 240
decimal data types, 240–241
floating-point data types, 239–240
integer, 238–239
克里斯汀·尼加德(Kristen Nygaard), 70 岁
Nygaard, Kristen, 70
对象切片, 493
Object slicing, 493
Objective-C, 87–88
Objective-C, 87–88
面向对象构造, 519–521
Object-oriented constructs, 519–521
面向对象语言, 19,85,206,219,279,291,360
对象的分配, 492–493
对象的释放, 492–493
设计问题, 489–494
动态绑定, 493
对象的排他性, 489–490
对象的初始化, 494
多重继承, 491–492
嵌套类, 493–494
单一继承, 491–492
静态绑定, 493
子类与子类型, 490–491
Object-oriented languages, 19, 85, 206, 219, 279, 291, 360
allocation of objects in, 492–493
deallocation of objects in, 492–493
design issues in, 489–494
dynamic binding in, 493
exclusivity of objects in, 489–490
initialization of objects in, 494
multiple inheritance in, 491–492
nested classes in, 493–494
single inheritance in, 491–492
static binding in, 493
subclasses vs. subtypes in, 490–491
面向对象编程, 3,20,485,663
在 C# 中, 513–515
在 C++ 中,85–88、496–509
继承, 485–488
实例数据存储, 519
Java 中,89–92、509–513
消息传递, 551–552
在 Objective-C 中, 87–88
Ruby, 515–518
Smalltalk ,83–85,494–496
Stroustrup 继续, 498–499
支持, 494–518
Object-oriented programming, 3, 20, 485, 663
in C#, 513–515
in C++, 85–88, 496–509
inheritance, 485–488
instance data storage, 519
in Java, 89–92, 509–513
message passing in, 551–552
in Objective-C, 87–88
in Ruby, 515–518
in Smalltalk, 83–85, 494–496
Stroustrup on, 498–499
support for, 494–518
对象, 486
分配, 492–493
在 C++ 中, 497
C# 的253
并发性方面, 559–560
重新分配, 492–493
排他性, 489–490
初始化, 494
在 Java 中, 509
Ruby 中, 516
Objects, 486
allocation of, 492–493
in C++, 497
of C#, 253
in concurrency, 559–560
deallocation of, 492–493
exclusivity of, 489–490
initialization of, 494
in Java, 509
in Ruby, 516
OCaml, 50,205,484,625,658,663
OCaml, 50, 205, 484, 625, 658, 663
开放accept条款, 557
Open accept clause, 557
操作数评估顺序, 309–311
Operand evaluation order, 309–311
操作语义, 134–137
评价, 136–137
自然, 135
问题, 135
过程, 135–136
结构, 135
Operational semantics, 134–137
evaluation of, 136–137
natural, 135
problems with, 135
process of, 135–136
structural, 135
操作符求值顺序, 303–309
Operator evaluation order, 303–309
运算符重载, 8,98,311-313
Operator overloading, 8, 98, 311–313
运算符优先级, 119–122
Operator precedence, 119–122
运算符优先级规则, 304
Operator precedence rules, 304
优化, 15
Optimization, 15
or else声明, 334
or else statements, 334
或运算符, 271
OR operators, 271
正交性, 9-11
Orthogonality, 9–11
otherwise,659
otherwise, 659
Out 模式参数传递, 378
Out mode parameter passing, 378
输出函数, 636
Output functions, 636
溢出, 315
Overflow, 315
重载运算符, 311–313
Overloaded operators, 311–313
重载子程序, 398
Overloaded subprograms, 398
重写方法, 487,491
Overridden methods, 487, 491
override命令, 487,514
override commands, 487, 514
包范围, 476
Package scope, 476
封装规范, 552
Package specification, 552
包, 80,476–477
Packages, 80, 476–477
成对不相交检验, 182
Pairwise disjointness test, 182
佩珀特·西摩, 83岁
Papert, Seymour, 83
编程范式, 498–499
Paradigms of programming, 498–499
参数配置文件, 398
Parameter profiles, 398
参数化抽象数据类型
在 C++ 中, 467–468
在 C# 2005 中,471
在 Java 5.0 中,468–470
Parameterized abstract data types
in C++, 467–468
in C# 2005, 471
in Java 5.0, 468–470
参数传递方法, 376
共同语言, 383–385
设计考虑, 389
例子, 389–392
实施模式, 377–382
执行情况, 382-383
语义模型, 377
Parameter-passing methods, 376
of common languages, 383–385
design considerations in, 389
examples of, 389–392
implementation models for, 377–382
implementation of, 382–383
semantic models of, 377
参数
实际, 369
阵列形式, 372
正式, 369
关键字, 370
在多维数组中, 387–389
位置, 370
对于子程序, 368–372
子程序, 392–394
类型检查, 385–387
Parameters
actual, 369
array formal, 372
formal, 369
keyword, 370
in multidimensional arrays, 387–389
positional, 370
for subprograms, 368–372
subprograms as, 392–394
type checking, 385–387
参数多态性, 399
Parametric polymorphism, 399
params,99
params, 99
父类, 486,491
子类之间的差异, 486–487
Parent class, 486, 491
differences between subclasses and, 486–487
括号, 307
Parentheses, 307
解析树,24–25,117–118
Parse trees, 24–25, 117–118
解析, 25,55,119
自下而上 ,173–174,183–190
复杂性, 174–175
介绍, 171–172
LL 语法课, 180–183
LR 解析器, 186–190
递归下降, 175–180
移位-归约算法, 186
自上而下, 172–173
Parsing, 25, 55, 119
bottom-up, 173–174, 183–190
complexity of, 174–175
introduction to, 171–172
LL grammar class in, 180–183
LR parsers for, 186–190
recursive-descent, 175–180
shift-reduce algorithms for, 186
top-down, 172–173
部分正确性, 152
Partial correctness, 152
部分评估, 658
Partial evaluation, 658
帕斯卡, 55,248,276,281,289,295,376,394,549,577
并发, 549
评估, 74–75
历史背景, 73
涡轮增压, 98
Pascal, 55, 248, 276, 281, 289, 295, 376, 394, 549, 577
Concurrent, 549
evaluation of, 74–75
historical background, 73
Turbo, 98
传递分配, 385
Pass-by-assignment, 385
传递复印, 380
Pass-by-copy, 380
路过的名字, 381–382
Pass-by-name, 381–382
通过引用传递, 380–381
Pass-by-reference, 380–381
通过结果, 378–379
Pass-by-result, 378–379
按值传递结果, 379–380
Pass-by-value-result, 379–380
按值传递, 378
Passed by value, 378
pcall构造, 575
pcall constructs, 575
PDA(下推自动机), 186
PDA (Pushdown automaton), 186
外围处理器, 535
Peripheral processors, 535
Perl ,92–94,341,350
数组分配, 255
数组, 92,253
赋值语句, 322
关联数组, 261
二进制逻辑运算符, 317
内置模式匹配操作, 244
条款形式, 334
混合模式赋值的强制规则, 324
复合赋值运算符, 320
赋值语句中的条件目标, 320
动态作用域, 220
枚举类型, 249
表达式, 303,309
foreach声明, 99,360
作为一种通用语言, 93
哈希, 96, 261, 262
混合实施系统, 27
混合模式分配, 324
多源赋值语句, 323
嵌套选择器, 334
起源和特征, 92–94
传递参数, 384
字符串, 245
下标, 251
一元算术运算符, 321
Unicode 中, 241
作为 UNIX 实用程序, 93
用户定位循环控制, 350
变量名称, 199
变量, 93
Perl, 92–94, 341, 350
array assignments, 255
arrays, 92, 253
assignment statements in, 322
associative arrays in, 261
binary logic operators of, 317
built-in pattern-matching operations, 244
clause form, 334
coercion rules for mixed-mode assignment, 324
compound assignment operators of, 320
conditional targets on assignment statements, 320
dynamic scoping in, 220
enumeration types of, 249
expressions in, 303, 309
foreach statement, 99, 360
as a general-purpose language, 93
hashes, 96, 261, 262
hybrid implementation system, 27
mixed-mode assignment in, 324
multiple-source assignment statements in, 323
nesting selectors in, 334
origins and characteristics of, 92–94
passing parameters of, 384
strings in, 245
subscripting in, 251
unary arithmetic operators in, 321
Unicode in, 241
as a UNIX utility, 93
user-located loop control in, 350
variable names in, 199
variables in, 93
玻璃市,艾伦, 44岁,51岁
Perlis, Alan, 44, 51
菲律宾比索, 6, 26, 29, 217, 386, 670
访问 HTML 表单数据, 96
内置模式匹配操作, 244
foreach声明, 99
形式参数, 370
函数定义, 217
全局变量, 217–218
起源和特征, 96
关系运算符, 316
标量类型, 339
switch声明, 339
类型绑定, 205,286
变量名称, 199
PHP, 6, 26, 29, 217, 386, 670
access to HTML form data, 96
built-in pattern-matching operations, 244
foreach statement, 99
formal parameters of, 370
function definitions, 217
global variables of, 217–218
origins and characteristics of, 96
relational operators of, 316
scalar types of, 339
switch statement, 339
type binding in, 205, 286
variable names in, 199
短语, 185–186
Phrases, 185–186
物理并发, 537
Physical concurrency, 537
管道运算符(| >),665
pipeline operators (| >), 665
普朗卡尔库尔, 36–37
Plankalkül, 36–37
PL/I, 66–69
设计流程, 67
评价, 68–69
历史背景, 66
语言概述, 67–68
PL/I, 66–69
design process, 67
evaluation of, 68–69
historical background, 66
language overview of, 67–68
指针类型
在 C 和 C++ 中, 277–278
悬垂,275–276,280–281
设计问题, 274
堆管理和 281–285
执行情况, 280-281
丢失堆动态变量, 276–277
274–275操作
问题, 275–277
280的陈述
Pointer types
in C and C++, 277–278
dangling, 275–276, 280–281
design issues with, 274
heap management and, 281–285
implementation of, 280–281
lost heap-dynamic variables in, 276–277
operations in, 274–275
problems with, 275–277
representations of, 280
波隆斯基,IP,70
Polonsky, I. P., 70
多态引用, 488
Polymorphic references, 488
多态子程序, 399
Polymorphic subprograms, 399
多态性, 399,411,488
Polymorphism, 399, 411, 488
可移植性, 16
Portability, 16
位置参数, 370
Positional parameters, 370
后置条件, 143,147
在赋值语句中, 145–147
介绍, 143
在逻辑预测试循环中, 149–152
在程序证明中, 152–155
在选择语句中, 148–149
在序列中, 147–148
最弱的先决条件, 144–145
Postconditions, 143, 147
in assignment statements, 145–147
introduction to, 143
in logical pretest loops, 149–152
in program proofs, 152–155
in selection statements, 148–149
in sequences, 147–148
weakest precondition and, 144–145
后测, 344
Posttest, 344
先例, 303–305
Precedence, 303–305
精度, 239
Precision, 239
谓词演算
小句形式, 683–684
命题集, 684–686
对于逻辑编程语言, 680–684
提案, 681–683
Predicate calculus
clausal form, 683–684
collections of propositions, 684–686
for logic programming languages, 680–684
propositions, 681–683
谓词函数, 129,637,641-642
Predicate functions, 129, 637, 641–642
谓词转换器, 150
Predicate transformers, 150
前缀运算符, 321
Prefix operators, 321
预处理器, 27–29
Preprocessors, 27–29
预测试, 349–350
Pretest, 349–350
原始数据类型, 9
布尔, 241
人物, 241–242
复杂, 240
十进制, 240–241
浮点数, 239
整数, 238–239
数字, 238–240
Primitive data types, 9
Boolean, 241
character, 241–242
complex, 240
decimal, 240–241
floating point, 239
integer, 238–239
numeric, 238–240
原始数字函数, 633–634
Primitive numeric functions, 633–634
替代原则, 490
Principle of substitution, 490
任务的优先级, 563–564
Priorities of tasks, 563–564
线程优先级, 571
Priorities of threads, 571
private、464、500–503、512–513
在 C# 中, 461
在 C++ 中, 456,462
Ruby 中, 464
private, 464, 500–503, 512–513
in C#, 461
in C++, 456, 462
in Ruby, 464
面向过程编程, 20
Procedure-oriented programming, 20
程序, 372–373
Procedures, 372–373
流程抽象, 449
Process abstraction, 449
进程, 539
Processes, 539
生产者-消费者问题, 540
Producer-consumer problem, 540
制作, 114
Productions, 114
程序计数器, 18
Program counter, 18
程序证明, 152–155
Program proofs, 152–155
编程设计方法, 19-20
Programming design methodologies, 19–20
编程域
人工智能, 6
商业应用, 6
科学应用, 5
网络软件和6
Programming domains
artificial intelligence in, 6
business applications in, 6
scientific applications in, 5
Web software and, 6
编程环境, 29
Programming environments, 29
Prolog,6,21,680-681,688-708
算术表达式, 695–698
基本要素, 688–703
封闭世界假设, 706
缺陷, 703–708
设计过程, 77
评价, 78
事实陈述, 689–690
目标陈述, 691–692
推理过程, 692–695
内在限制, 708
语言概述, 77–78
列出结构, 698–703
否定问题, 706–708
起源, 688
分辨率顺序控制, 703–705
规则声明, 690–691
条款, 689
Prolog, 6, 21, 680–681, 688–708
arithmetic expression in, 695–698
basic elements of, 688–703
closed-world assumption in, 706
deficiencies of, 703–708
design process for, 77
evaluation of, 78
fact statements, 689–690
goal statements, 691–692
inferencing process of, 692–695
intrinsic limitations in, 708
language overview of, 77–78
list structures in, 698–703
negation problem in, 706–708
origin of, 688
resolution order control in, 703–705
rule statements, 690–691
terms, 689
Prolog ++,20,78
Prolog++, 20, 78
子程序链接的序言, 419
Prologue of subprogram linkage, 419
属性, C#, 461
Properties, C#, 461
提案, 681–683
Propositions, 681–683
protected访问修饰符, 462
protected access modifiers, 462
受保护对象, 550,559–560
Protected objects, 550, 559–560
议定书, 368, 450, 489, 495, 511, 514, 517
对于事件处理方法 ,610–611、614、617
功能, 393–394
消息, 486
重载子程序, 398
子程序, 368
Protocol, 368, 450, 489, 495, 511, 514, 517
for event-handling methods, 610–611, 614, 617
function’s, 393–394
message, 486
of overloaded subprogram, 398
of a subprogram, 368
原型, 368
Prototypes, 368
伪代码, 37-40
介绍, 37–38
相关工作, 40
短代码, 38–39
快速编码, 39
UNIVAC“编译”系统, 39
Pseudocodes, 37–40
introduction to, 37–38
related work, 40
Short Code, 38–39
Speedcoding, 39
UNIVAC “compiling” system, 39
public、464、500–503、512–513
在 C# 中, 461
在 C++ 中, 456,462
Ruby 中, 464
public, 464, 500–503, 512–513
in C#, 461
in C++, 456, 462
in Ruby, 464
纯粹解释, 26
Pure interpretation, 26
纯虚函数, 506
Pure virtual function, 506
纯虚方法, 489
Pure virtual method, 489
下推自动机(PDA), 186
Pushdown automaton (PDA), 186
Python, 217
数组输入, 254,269
关联数组, 262
二进制算术运算, 405
复合赋值运算符, 320
控制表达式, 332
数据类型, 238,240
声明, 257
枚举类型, 249
异常处理, 605–607
正式参数, 370–372
函数头, 370–371
全局变量, 217,218
全局变量, 376
for声明, 347–348
哈希, 96,264
列表推导, 270
嵌套函数, 219
起源和特点, 96–97
参数传递方法, 385
多态性, 399
range函数, 270,348
记录, 263
反射操作, 527
范围, 223
选择声明, 335
选择器语句, 341–342
切片引用, 257
243–244串
子程序头, 367
子程序, 397,411,472
then 和 else 子句, 333
元组类型,266–267,293
类型绑定, 205
Unicode 中, 241
用户定位循环控制, 350–351
变量, 250
Python, 217
arrays in, 254, 269
associative arrays in, 262
binary arithmetic operations in, 405
compound assignment operators of, 320
control expressions in, 332
data types in, 238, 240
declarations in, 257
enumeration types of, 249
exception handling in, 605–607
formal parameters of, 370–372
function header of, 370–371
global variable in, 217, 218
global variables of, 376
for statement of, 347–348
hashes, 96, 264
list comprehension in, 270
nesting functions in, 219
origins and characteristics of, 96–97
parameter passing methods of, 385
polymorphism in, 399
range function, 270, 348
records, 263
reflective operations in, 527
scopes in, 223
selection statement, 335
selector statement, 341–342
slice reference, 257
strings of, 243–244
subprogram headers of, 367
subprograms of, 397, 411, 472
then and else clauses, 333
tuple type of, 266–267, 293
type binding in, 205
Unicode in, 241
user-located loop control in, 350–351
variables of, 250
量词, 682–683
Quantifiers, 682–683
准并发, 408
Quasi-concurrency, 408
准并发子程序, 537
Quasi-concurrent subprograms, 537
查询, 709–710
Queries, 709–710
快速排序算法, 661
Quicksort algorithm, 661
引文, 638,652
QUOTE, 638, 652
竞争条件, 540
Race conditions, 540
单选按钮 ,609–611,614
Radio buttons, 609–611, 614
raise声明, 606,617
raise statements, 606, 617
引发异常, 591,595,598
Raised exceptions, 591, 595, 598
兰德公司, 45
RAND Corporation, 45
范围, 239
625–626集
Range, 239
set, 625–626
原始方法, 401
Raw methods, 401
RDBMS(关系数据库管理系统), 709
RDBMSs (Relational database management systems), 709
阅读声明, 588
Read statement, 588
可读性,7–8,15,249
Readability, 7–8, 15, 249
阅读器宏, 652
Reader macros, 652
读者, 625
Readers, 625
读取-求值-打印循环(REPL), 633
Read-evaluate-print loops (REPLs), 633
readonly常数, 226
readonly constants, 226
准备就绪任务, 541
Ready task, 541
实数类型, 133,654–655
Real types, 133, 654–655
认可, 112
Recognition, 112
记录类型
记录的定义, 264/
评价, 265
执行情况, 265-266
引用字段, 264–265
Record types
definition of records in, 264/
evaluation of, 265
implementation of, 265–266
references to fields in, 264–265
矩形阵列, 256
Rectangular arrays, 256
递归,427–429,626
Recursion, 427–429, 626
递归规则, 115
Recursive rules, 115
递归下降解析器, 175–183
LL 语法课, 180–183
递归下降子程序, 175–180
Recursive-descent parsers, 175–183
LL grammar class in, 180–183
recursive-descent subprogram, 175–180
ref类型,F#, 577
ref type, F#, 577
参考计数器, 282
Reference counters, 282
参考参数, 279, 384
Reference parameters, 279, 384
引用类型
悬垂指针, 280–281
堆管理和 281–285
执行情况, 281-285
Java 和 C#, 280
280的陈述
变量, 278–279
Reference types
dangling pointers and, 280–281
heap management and, 281–285
implementation of, 281–285
of Java and C#, 280
representations of, 280
variables, 278–279
引用环境, 223–224
Referencing environments, 223–224
指称透明度, 310–311,628
Referential transparency, 310–311, 628
反思, 522
在 C# 中, 526–528
Java 中, 523–525
Reflection, 522
in C#, 526–528
in Java, 523–525
驳斥完毕, 685
Refutation complete, 685
正则表达式, 12,244
Regular expressions, 12, 244
常规语法, 113,165
Regular grammars, 113, 165
常规语言, 165
Regular languages, 165
关系数据库管理系统 (RDBMS), 709
Relational database management systems (RDBMSs), 709
关系表达式, 316
Relational expressions, 316
关系运算符, 316
Relational operators, 316
释放信号量子程序, 544–548
Release semaphore subprogram, 544–548
可靠性, 14-15
Reliability, 14–15
会合, 552,554
Rendezvous, 552, 554
重复, 18
repeat, 18
REPL(读取-求值-打印循环), 633
REPLs (read-evaluate-print loops), 633
保留字, 199–200
Reserved words, 199–200
决议, 684–686
自下而上, 693
封闭世界假设, 706
定义, 684
秩序控制, 703–705
在Prolog,692–695,703,705
自上而下, 693
Resolution, 684–686
bottom-up, 693
closed-world assumption in, 706
defined, 684
order control, 703–705
in Prolog, 692–695, 703, 705
top-down, 693
简历, 408,410
Resumes, 408, 410
恢复, 592
Resumption, 592
返回值, 397
Returned values, 397
返回, 418
Returns, 418
reverse功能, 702
reverse functions, 702
理查兹·马丁, 75岁
Richards, Martin, 75
右递归语法规则, 123
Right recursive grammar rules, 123
右侧(RHS), 114、123、138、174、181、186、188、207
Right-hand side (RHS), 114, 123, 138, 174, 181, 186, 188, 207
里奇,丹尼斯, 75–76,356
Ritchie, Dennis, 75–76, 356
罗森,吉多·范, 96
Rossum, Guido van, 96
鲁塞尔,菲利普, 77,688,711
Roussel, Phillippe, 77, 688, 711
行主顺序, 259
Row major order, 259
红宝石
抽象数据类型, 463–466
二进制逻辑运算符, 317
内置模式匹配操作, 244
case 表达式, 339, 341
案件陈述, 342
课程, 463–464
复合赋值运算符, 320
构造函数, 463
动态绑定, 517
封装, 463
枚举类型, 249
评价, 466,517–518
异常处理, 607–608
指数运算, 306
形式参数, 370,372
多重选择结构的形式, 339–340
一般特征, 515–517
哈希, 262–263
信息隐藏在, 464–465
继承, 517
迭代器, 360
模块, 477–478
面向对象编程, 515–518
物体, 516
起源和特点, 97–98
参数传递方法, 385
多态性, 399
记录, 263
选择声明, 335
子程序头, 367
类型绑定, 205
用户定位循环控制, 350
Ruby
abstract data types in, 463–466
binary logic operators of, 317
built-in pattern-matching operations, 244
case expressions, 339, 341
case statement, 342
classes of, 463–464
compound assignment operators of, 320
constructors in, 463
dynamic binding, 517
encapsulation of, 463
enumeration types of, 249
evaluation of, 466, 517–518
exception handling in, 607–608
exponentiation in, 306
formal parameters of, 370, 372
forms of multiple-selection constructs, 339–340
general characteristics, 515–517
hashes, 262–263
information hiding in, 464–465
inheritance in, 517
iterators of, 360
modules, 477–478
object-oriented programming in, 515–518
objects in, 516
origins and characteristics of, 97–98
parameter passing methods of, 385
polymorphism in, 399
records, 263
selection statement, 335
subprogram headers of, 367
type binding in, 205
user-located loop control in, 350
后果规则, 146
Rule of consequence, 146
规则, 114–115,117,120
Rules, 114–115, 117, 120
run方法 ,560–561,570
run methods, 560–561, 570
正在运行的任务, 542
Running task, 542
运行时堆栈, 424
Run-time stacks, 424
拉塞尔,斯蒂芬·B.,632
Russell, Stephen B., 632
R 值, 202
R-value, 202
满足子目标, 692
Satisfying subgoals, 692
可扩展算法, 535
Scalable algorithms, 535
调度程序, 541
Schedulers, 541
Scheme 语言, 49
适用于所有函数形式, 649–650
代码构建函数, 650–651
控制流入, 637–638
定义函数, 634–636
函数定义示例, 643–646
功能组合物, 648–649
功能形式, 648–650
口译员, 633
LET,第 646–647 页
列出函数, 638–641
列表, 269
数字谓词函数, 637
起源, 633
输出函数, 636
谓词函数, 641–642
原始数字函数, 633–634
尾递归函数, 647–648
Scheme language, 49
apply-to-all functional forms in, 649–650
code-building functions in, 650–651
control flow in, 637–638
defining functions in, 634–636
examples of function definitions in, 643–646
functional compositions in, 648–649
functional forms in, 648–650
interpreter in, 633
LET, 646–647
list functions in, 638–641
lists in, 269
numeric predicate functions in, 637
origins of, 633
output functions in, 636
predicate functions in, 641–642
primitive numeric functions in, 633–634
tail recursive functions in, 647–648
施瓦茨,朱尔斯·I.,53岁
Schwartz, Jules I., 53
科学应用, 5
Scientific applications, 5
范围
213–215块
声明顺序, 215–216
动态作用域,220–222,437–441
全球, 217–219
一生, 222–223
命名常量和 224–226
引用环境和 223–224
静态作用域, 220
在子程序中,实施 437–441
Scope
blocks for, 213–215
declaration order for, 215–216
dynamic scoping, 220–222, 437–441
global, 217–219
lifetime and, 222–223
named constants and, 224–226
referencing environments and, 223–224
static scoping, 220
in subprograms, implementing, 437–441
斯科特·达纳, 142岁
Scott, Dana, 142
脚本语言, 92–98
JavaScript, 94-96
Perl, 92-94
PHP, 96
Python, 96
Ruby, 97–98
Scripting languages, 92–98
JavaScript, 94–96
Perl, 92–94
PHP, 96
Python, 96
Ruby, 97–98
脚本, 162
Scripts, 162
select声明, 555–556
select statements, 555–556
选择, 148–149
Selection, 148–149
选择声明
多选, 336–343
后置条件, 148–149
双向, 332
Selection statements
multiple-selection, 336–343
postconditions in, 148–149
two-way, 332
选择器表达式, 336
Selector expressions, 336
语义域, 137
Semantic domains, 137
语义学。
动态, 134–155
介绍, 110–111
自然运作, 135
运作, 134–137
静态, 128–129
结构运作, 135
Semantics.
dynamic, 134–155
introduction to, 110–111
natural operational, 135
operational, 134–137
static, 128–129
structural operational, 135
信号量, 544–548
Semaphores, 544–548
句子, 111
Sentences, 111
句子形式, 116
Sentential forms, 116
序列, 147–148
Sequences, 147–148
塞尔戈特,MJ,710
Sergot, M. J., 710
服务器任务, 554
Server tasks, 554
Servlet 容器, 101
Servlet containers, 101
Setter 方法, 464,516
Setter methods, 464, 516
S-表达式, 632
S-expressions, 632
浅入浅出, 439–441
Shallow access, 439–441
浅装订, 393–394
Shallow binding, 393–394
分享, 51, 53, 67
SHARE, 51, 53, 67
共享继承, 491
Shared inheritance, 491
肖,JC,45岁
Shaw, J. C., 45
移位-归约算法, 186
Shift-reduce algorithms, 186
短代码, 38–39
Short Code, 38–39
short整数, 238
short integer, 238
短程委员会, 58
Short Range Committee, 58
短路评估, 318–319
Short-circuit evaluation, 318–319
副作用 ,309–311,396–397
Side effects, 309–311, 396–397
SIGPLAN 通知, 80,103
SIGPLAN Notices, 80, 103
SIMD(单指令多数据)计算机, 536
SIMD (Single-Instruction, Multiple-Data) computers, 536
西蒙·赫伯特45岁
Simon, Herbert, 45
简单赋值语句, 130,687
Simple assignment statements, 130, 687
简单函数, 626–627
Simple functions, 626–627
简单列表, 630,643–644
Simple lists, 630, 643–644
简单短语, 185
Simple phrases, 185
简单,8–9,13,73–75,163
Simplicity, 8–9, 13, 73–75, 163
模拟67, 19, 384, 453, 485–486 , 498, 500
设计过程, 70–71
语言概述, 71
支持协程, 71
SIMULA 67, 19, 384, 453, 485–486, 498, 500
design process for, 70–71
language overview of, 71
support for coroutines in, 71
单一继承, 487,491-492
Single inheritance, 487, 491–492
单指令多数据 (SIMD) 计算机, 536
Single-Instruction, Multiple-Data (SIMD) computers, 536
单一尺寸细胞, 281–282
Single-size cells, 281–282
sleep方法, 562,577
sleep methods, 562, 577
切片, 242,257
Slices, 242, 257
Smalltalk , 86,494–496
动态绑定, 495–496
评价, 496
一般特征, 494–495
继承, 495
Smalltalk, 86, 494–496
dynamic binding, 495–496
evaluation of, 496
general characteristics, 494–495
inheritance in, 495
斯诺博尔, 69–70
SNOBOL, 69–70
Solaris 通用桌面环境 (CDE), 29
Solaris Common Desktop Environment (CDE), 29
源语言, 23,26,41
Source languages, 23, 26, 41
special,五十
special, 50
特殊词语, 12
Special words, 12
快速编码, 39
Speedcoding, 39
SQL(结构化查询语言), 709
SQL (Structured Query Language), 709
堆栈动态数组, 54,252
Stack-dynamic arrays, 54, 252
堆栈动态局部变量, 421–429
Stack-dynamic local variables, 421–429
堆栈动态变量, 208–209
Stack-dynamic variables, 208–209
斯坦福大学, 73
Stanford University, 73
start方法, 561
start methods, 561
开始符号, 115
Start symbols, 115
状态图, 165
State diagrams, 165
项目状态, 140
State of programs, 140
语句级并发, 535,538,578-580
Statement-level concurrency, 535, 538, 578–580
语句级控制结构
反控制循环, 344–348
for声明, 345–348
守卫的命令, 356–359
迭代语句, 343–355
逻辑控制循环, 348
双向选择语句, 332
无条件分支语句, 355–356
Statement-level control structures
counter-controlled loops, 344–348
for statements, 345–348
guarded commands by, 356–359
iterative statements, 343–355
logically controlled loops, 348
two-way selection statements, 332
unconditional branch statement, 355–356
静态祖先, 212
Static ancestors, 212
静态数组, 252
Static arrays, 252
静态结合, 204–205,493
Static binding, 204–205, 493
静态链, 430–435
Static chaining, 430–435
静态长度字符串, 244
Static length strings, 244
静态链接, 430–431
Static links, 430–431
static修饰符, 208,253
static modifiers, 208, 253
静态父母, 212,430–431
Static parents, 212, 430–431
静态作用域, 49,50,376,435,439,472,633,652,653
Static scoping, 49, 50, 376, 435, 439, 472, 633, 652, 653
静态 语义, 128–129
Static semantics, 128–129
静态类型绑定, 204–205
Static type bindings, 204–205
静态变量, 209,210,375
装订中, 207–208
Static variables, 209, 210, 375
in binding, 207–208
静态深度, 431
static_depth, 431
Steele Jr.,Guy L.,338
Steele Jr., Guy L., 338
Steelman 需求文档, 80
Steelman requirements document, 80
步长, 344
Stepsize, 344
斯蒂廷数学中心, 96
Stichting Mathematisch Centrum, 96
存储绑定, 207–211
Storage bindings, 207–211
斯特雷奇,克里斯托弗, 142
Strachey, Christopher, 142
稻草人要求文件, 79-80
Strawman requirements document, 79–80
严格的编程语言, 661
Strict programming languages, 661
强类型, 287
Strong typing, 287
structs,10,36,90,449,453,513
在 C# 中, 99、462、479
structs, 10, 36, 90, 449, 453, 513
in C#, 99, 462, 479
结构操作语义, 135
Structural operational semantics, 135
结构类型等价性, 288
Structure type equivalence, 288
结构化查询语言 (SQL), 709
Structured Query Language (SQL), 709
结构, 689–690
Structures, 689–690
子类, 486,489–490
Subclasses, 486, 489–490
子目标, 704–706
Subgoals, 704–706
子程序调用, 367
Subprogram calls, 367
子程序定义, 367
Subprogram definition, 367
子程序头, 367
Subprogram headers, 367
子程序链接, 418
Subprogram linkage, 418
子程序级并发, 539–544
Subprogram-level concurrency, 539–544
子程序
在 C++ 中, 399–401
在 C# 2005 中,403
间接呼唤, 394–396
特征, 366–367
闭包, 374,405–407
协程, 407–410
定义, 367–368
设计问题, 374,396–397
调于 F#, 403–404
功能为, 372–373
基本原理, 366–373
通用, 374,399–404
在 Java 5.0 中,401–403
局部变量, 375–376
多维数组和 387–389
嵌套, 376
超载, 374,398
参数概况, 368
参数传递方法, 376–392
参数为, 392–394
参数, 368–372
程序, 372–373
议定书, 368
用户定义的重载数据类型, 404–405
Subprograms
in C++, 399–401
in C# 2005, 403
calling indirectly, 394–396
characteristics of, 366–367
closures, 374, 405–407
coroutines, 407–410
definitions in, 367–368
design issues for, 374, 396–397
in F#, 403–404
functions as, 372–373
fundamentals of, 366–373
generic, 374, 399–404
in Java 5.0, 401–403
local variables in, 375–376
multidimensional arrays and, 387–389
nested, 376
overloaded, 374, 398
parameter profile of, 368
parameter-passing methods, 376–392
parameters as, 392–394
parameters in, 368–372
procedures as, 372–373
protocol of, 368
user-defined overloaded data types in, 404–405
子程序,实施
块, 436–437
来电, 418
深度访问, 437–439
动态作用域, 437–441
嵌套子程序, 429–435
递归, 427–429
返回, 418
浅入浅出, 439–441
堆栈动态局部变量, 421–429
静态链, 430–435
没有递归, 425–427
Subprograms, implementing
blocks in, 436–437
calls in, 418
deep access in, 437–439
dynamic scoping in, 437–441
of nested subprograms, 429–435
with recursion, 427–429
returns in, 418
shallow access in, 439–441
stack-dynamic local variables for, 421–429
static chaining, 430–435
without recursion, 425–427
Ada 中的子范围类型, 290
Subrange types, in Ada, 290
下标绑定, 252–254
Subscript bindings, 252–254
下标, 251
Subscripts, 251
子字符串引用, 242
Substring references, 242
subtype枚举类型, 290
subtype enumeration type, 290
亚型多态性, 399
Subtype polymorphism, 399
亚型, 293,490–491
Subtypes, 293, 490–491
Sun Microsystems, 89,94
Sun Microsystems, 89, 94
超级, 486,500
Superclass, 486, 500
Swing GUI 组件, 609–610
Swing GUI components, 609–610
符号原子和列表, 641–642
Symbolic atoms and lists, 641–642
符号逻辑, 681
Symbolic logic, 681
同步, 536,539。
CML, 576
显式锁定, 569–570
非阻塞, 569
线程, 573–574
Synchronization, 536, 539.
of CML, 576
explicit locks as, 569–570
nonblocking, 569
of threads, 573–574
同步消息传递, 551–552
Synchronous message passing, 551–552
句法领域, 137–139
Syntactic domains, 137–139
语法。
歧义语法, 118–119
分析, 163
分析仪, 24, 28
结合性, 122–123
BNF和,113–114,126
上下文无关文法, 113,114
衍生自, 115–117
设计, 12
扩展 BNF 范式, 125–127
基本原理, 114–115
发电机, 112–113
语法和,115–117,127–128
if-then-else声明, 308,342
Java, 110,111
JavaScript, 94
LISP 的48
列出描述, 115
ML, 50
运算符优先级, 119–122
解析和, 117–118
Python 的97
识别器, 112,127–128
Ruby 的98
Smalltalk, 84,86
120、124–125中的无歧义语法
Syntax.
ambiguous grammars in, 118–119
analysis, 163
analyzer, 24, 28
associativity in, 122–123
BNF and, 113–114, 126
context-free grammars and, 113, 114
derivations in, 115–117
design, 12
in Extended BNF, 125–127
fundamentals of, 114–115
generators in, 112–113
grammars and, 115–117, 127–128
if-then-else statements, 308, 342
of Java, 110, 111
of JavaScript, 94
of LISP, 48
list descriptions in, 115
of ML, 50
operator precedence in, 119–122
parsing and, 117–118
of Python, 97
recognizers in, 112, 127–128
of Ruby, 98
of Smalltalk, 84, 86
unambiguous grammars in, 120, 124–125
综合属性, 129
Synthesized attributes, 129
雪城大学, 684
Syracuse University, 684
System.Object,98
System.Object, 98
系统编程, 66,75
Systems programming, 66, 75
系统软件, 22
Systems software, 22
尾递归函数, 647–648
Tail recursive functions, 647–648
Task(s), 539–544
并发执行, 543
描述符, 544
重量级, 539
轻量级, 539
州, 541–542
终止, 555,557
Task(s), 539–544
concurrent execution of, 543
descriptors, 544
heavyweight, 539
lightweight, 539
states, 541–542
termination, 555, 557
task规格, 552–553
task specifications, 552–553
任务终止, 555
Task termination, 555
任务就绪队列, 542,562
Task-ready queue, 542, 562
模板函数, 399
Template functions, 399
终端符号, 118, 138, 176
Terminal symbols, 118, 138, 176
终值, 344
Terminal values, 344
terminate,557
terminate, 557
条款, 689
Terms, 689
三元运算符, 309
Ternary operators, 309
测试, 661
Tests, 661
德克萨斯 A&M 大学, 454,498
Texas A&M University, 454, 498
文本框, 609
Text boxes, 609
定理证明, 680,684,691
Theorem-proving, 680, 684, 691
数据类型理论, 236
Theory of data types, 236
汤普森,肯, 75岁
Thompson, Ken, 75
线程, 539
在 C# 中, 570–575
爪哇, 560–570
优先事项, 563–564
同步, 573–574
线程类, 561–563
Threads, 539
in C#, 570–575
in Java, 560–570
priorities of, 563–564
synchronization of, 573–574
Thread class, 561–563
控制线程, 537
Threads of control, 537
throw声明, 620
throw statements, 620
引发异常, 599
Thrown exceptions, 599
throws条款, 601
throws clauses, 601
代币, 114,164–166
Tokens, 114, 164–166
墓碑, 280–281
Tombstones, 280–281
自上而下的解析器, 172–173
Top-down parsers, 172–173
自上而下的解决方案, 693
Top-down resolution, 693
总正确率, 152
Total correctness, 152
追踪模型, 696
Tracing models, 696
修剪, 72
Trimming, 72
三脚架, 64,65
Tripod, 64, 65
try块, 569,571
try blocks, 569, 571
try 条款,594–596,600,602–604
try clauses, 594–596, 600, 602–604
元组, 266–267
Tuples, 266–267
图灵机, 631
Turing machine, 631
特纳,大卫, 50岁
Turner, David, 50
二进制补码, 239
twos complement, 239
双向选择语句
子句形式, 332–333
控制表达式, 332
设计问题, 332–333
嵌套选择器, 333–336
选择器表达式, 336
Two-way selection statements
clause forms in, 332–333
control expressions for, 332
design issues for, 332–333
nesting selectors in, 333–336
selector expressions in, 336
类型绑定
动态, 205–207
静态, 204–205
Type bindings
dynamic, 205–207
static, 204–205
类型检查, 14,207,286–287
Type checking, 14, 207, 286–287
类型转换, 313–315
Type conversions, 313–315
类型,定义, 202
Type, defined, 202
类型枚举类型, 247–250
type enumeration type, 247–250
类型等价, 288–291
Type equivalence, 288–291
类型错误, 286
Type error, 286
类型推断, 204
Type inference, 204
typedef,291
typedef, 291
无歧义语法,120–122,124–125
对于 if-else, 124–125
Unambiguous grammars, 120–122, 124–125
for if-else, 124–125
一元 赋值数据类型,311–312,321
Unary assignment data types, 311–312, 321
一元运算符, 321
Unary operators, 321
未经检查的异常, 601,617
Unchecked exceptions, 601, 617
无条件分支语句, 355–356
Unconditional branch statements, 355–356
undef,93,140–141
undef, 93, 140–141
undefined,254
undefined, 254
下溢, 315
Underflow, 315
昂格尔,大卫, 508
Ungar, David, 508
统一码, 51,241
Unicode, 51, 241
统一, 685,692
Unification, 685, 692
未实例化的变量, 708
Uninstantiated variables, 708
union,273
union, 273
联合类型, 270–273
设计问题, 271
受歧视工会与自由工会, 271
评价, 273
调 F#, 271–272
执行情况, 273
Union types, 270–273
design issues for, 271
discriminated vs. free unions, 271
evaluation of, 273
in F#, 271–272
implementation of, 273
UNIVAC, 38-39
UNIVAC, 38–39
UNIVAC 科学交流中心 (USE), 51
UNIVAC Scientific Exchange (USE), 51
艾克斯-马赛大学, 77,688
University of Aix-Marseille, 77, 688
爱丁堡大学, 50,77,688
University of Edinburgh, 50, 77, 688
犹他大学, 83
University of Utah, 83
UNIX, 29,93
UNIX, 29, 93
无限范围, 406
Unlimited extent, 406
unsafe,C#, 90,279
unsafe, C#, 90, 279
使用(UNIVAC 科学交流), 51
USE (UNIVAC Scientific Exchange), 51
用户定位循环控制机制, 350-351
User-located loop control mechanisms, 350–351
using指令, 476
using directive, 476
val声明, 656
val statements, 656
值, 202
Value, 202
值类型, 273
Value types, 273
范罗苏姆,吉多, 96
van Rossum, Guido, 96
van Wijngaarden 语法, 72
van Wijngaarden grammars, 72
var声明, 204
var declarations, 204
变量,200–202,237
地址, 201–202
显式堆动态变量, 209–210
隐式堆动态变量, 210–211
名字, 201
范围, 211–213
类型, 202
价值, 202
Variables, 200–202, 237
addresses of, 201–202
explicit heap-dynamic variables, 209–210
implicit heap-dynamic variables, 210–211
names of, 201
scope of, 211–213
type of, 202
value of, 202
可变大小的细胞, 284–285
Variable-size cells, 284–285
VAX 小型计算机, 9
VAX minicomputers, 9
VB(Visual BASIC), 13,63
VB (Visual BASIC), 13, 63
VDL(维也纳定义语言), 136–137
VDL (Vienna Definition Language), 136–137
矢量处理器, 535–537
Vector processors, 535–537
vehicle班级, 486
vehicle class, 486
维也纳定义语言(VDL), 136–137
Vienna Definition Language (VDL), 136–137
虚拟方法表(vtables), 519
Virtual method tables (vtables), 519
virtual保留字, 528
virtual reserved word, 528
可见变量, 211
Visible variables, 211
Visual BASIC (VB), 13,63
Visual BASIC (VB), 13, 63
Visual Studio, 29
Visual Studio, 29
void,10,367,371
void, 10, 367, 371
void *指针, 278
void * pointers, 278
冯·诺依曼架构, 17–18,26,624
von Neumann architecture, 17–18, 26, 624
冯·诺依曼瓶颈, 26
von Neumann bottlenecks, 26
vtables(虚拟方法表), 519
vtables (virtual method tables), 519
wait信号量, 544–548
wait semaphores, 544–548
拉里·沃尔, 92岁
Wall, Larry, 92
最弱的先决条件, 144–145
Weakest preconditions, 144–145
网络浏览器, 534
Web browsers, 534
网络软件, 6
Web software, 6
彼得·温伯格, 92 岁
Weinberger, Peter, 92
明确性, 16
Well-definedness, 16
惠勒,大卫·J.,40岁
Wheeler, David J., 40
when条款, 557
when clause, 557
while,90
Java, 110,111
在逻辑预测试循环中, 149–152
循环, 213,350,566,573
声明, 318,322,349-350
while, 90
Java, 110, 111
in logical pretest loops, 149–152
loops, 213, 350, 566, 573
statement, 318, 322, 349–350
威廉·惠特克中校, 79岁
Whitaker, Lt. Col. William, 79
扩展类型转换, 313
Widening type conversions, 313
小部件, 608–609
Widgets, 608–609
通配符类型, 402–403
Wildcard types, 402–403
威尔登,JC,220
Wileden, J. C., 220
威尔克斯,莫里斯 V.,40岁
Wilkes, Maurice V., 40
Windows, 64,65
Windows, 64, 65
尼克劳斯·沃思73 岁
Wirth, Niklaus, 73
沃尔夫, AL , 220
Wolf, A. L., 220
包装类, 90
Wrapper classes, 90
可书写性, 13
Writability, 13
施乐帕洛阿尔托研究中心(Xerox PARC), 84
Xerox Palo Alto Research Center (Xerox PARC), 84
XML(可扩展标记语言), 101–102
XML (eXtensible Markup Language), 101–102
XSLT(可扩展样式表语言转换), 101
XSLT (eXtensible Stylesheet Language Transformations), 101
yacc, 128,189
yacc, 128, 189
楚泽,康拉德, 36–37
Zuse, Konrad, 36–37
该架构由两个单元组成,即存储单元和中央处理单元。中央处理单元由算术和逻辑单元和控制单元组成。控制单元将信息发送到算术和逻辑单元。输入和输出设备连接到此单元。CPU 中的操作结果作为存储单元的输入。该单元存储指令和数据,并将输出传递给 CP U。
The architecture consists of two units, the memory unit and the central processing unit. The central processing consists of the Arithmetic and logic unit and the control unit. The control unit sends information to the arthmetic and logic unit. The input and output devices are connected to this unit. The result of the operations in the C P U is given as input to the memory unit. This unit stores both instructions and data and passes on the output to the C P U.
第一层是裸机。第二层是宏指令。第三层是操作系统。第四层分为8个部分: da编译器,标记为虚拟ada计算机; Java虚拟机,标记为虚拟java计算机; C编译器,标记为虚拟 C计算机; dot NET公共语言运行时间; Scheme解释器,标记为虚拟scheme计算机;操作系统命令解释器; 汇编程序,标记为虚拟汇编语言计算机;以及一个不完整部分,表示该层中的附加项目。第五个不完整层包括: java虚拟计算机由java编译器和java编译器组成; dot NET公共语言运行时间单元由位于虚拟vb dot net计算机中的VB dot NET编译器和标记为虚拟C hash计算机的C hash编译器组成。
The first layer is the bare machine. The second layer is the macro instruction. The third layer is the operating system. The fourth layer is divided into 8 sections: A d a compiler labeled virtual a d a computer; Java virtual machine labeled virtual java computer; C compiler labeled virtual C computer; dot NET common language run time; Scheme interpreter labeled virtual scheme computer; operating system command interpreter; assembler labeled virtual assembly language computer; and an incomplete section which indicates additional items in this layer. The fifth incomplete layer includes; the java virtual computer consists of the java compiler and java compiler; the dot NET common language run time unit consists of V B dot NET compiler which is in a virtual v b dot net computer and C hash compiler labeled Virtual C hash computer.
源程序被输入到词法分析器中。这些词法单元被传递给语法分析器。词法分析器和语法分析器也作为符号表的输入。来自语法分析器的解析树和来自符号表的输出被输入到中间代码生成器和语义分析器中。此语义分析器的输出被发送以进行可选优化,并传递给代码生成器。生成的中间代码和来自符号表的输出被提供给代码生成器。代码生成器生成的代码机器语言和输入数据被提供给计算机,计算机产生结果。
The source program is fed into the lexical analyzer. These lexical units are passed to the syntax analyzer. The lexical analyzer and the syntax analyzer are also given as input to the symbol table. The parse trees from the syntax analyzer and the output from the symbol table is fed into the intermediate code generator and semantic analyzer. The output of this semantic analyzer is sent for optional optimization and passed to the code generator. The intermediate code generated and the output from the symbol table is given to the code generator. The code machine language generated by the code generator and input data are given to the computer which produces the results.
源程序被输入到词法分析器。这些词法单元被传递给语法分析器。分析器生成的解析树被输入到中间代码生成器。中间代码和输入数据被传递到解释器。解释器的输出就是结果。
The source program is fed into the lexical analyzer. These lexical units are passed to the syntax analyzer. The parse trees from the analyzer are fed into the intermediate code generator. The intermediate code and input data are passed into the interpreter. The output from the interpreter is the result.
In 19 57, Fortran 1 was created which lead to the frotran continuations as well as inspiring the ALGOL programing. In 19 58, Fortran 2 was created which lead to FORTRAN 4 in 1962. In 1978, Fortran 77. In 19 90 Fortran 90. In 19 95 Fortran 95. In 2003 Fortran 2003. In 2008 Fortran 2008. In 20 15 Fortran 2015.In 19 58, ALGOL 58 was inspired by Fortran 1. And continued in it’s series in 1960 with ALGOL 60. In 1966, ALGOL W which developed into pascal in 19 71, pascal in turn inspired 3 products, MODULA 2, M L, and Ada 83. On a different branch in 1968: ALGOL 68 was developed from ALGOL 60. C was partially inspired by ALGOL 68 in 19 71. ALGOL 60 inspired Basic in 19 64 which lead to quick basic in 19 88 then visual basic in 19 90 and finally visual basic dot NET in 2001. From Pascal in 19 71 there are 3 product lines, MODULA 2, M L, and Ada 83. MODULA 2 lead to oberon in 19 88. Modula 2 lead to Modula 3 in 19 88, Modela 3 partially inspired python in 19 92. In 1963, SIMULA 1 was created, inspired by ALGOL 60. In 1967, SIMULA 67. In 19 80 small talk 80. Small talk partially inspired both objective c in 19 84,which lead to swift in 20 14, and ruby in 19 94. Simula 67 along with Ada 83 both inspried Eiffel in 19 90. In 19 57 Flow matic was created which lead to COBOL in 19 60 then P L slash I in 19 64. In 19 62 C P L was created and lead to B C P L in 19 69. In 19 70 B was created which lead to C in 19 71. The line then splits into two branches. The first branch starts with A N S I, C or C 89 in 19 89 and goes on to Python in 19 91 then python 2 point 0 in 2000 and Python 3 point 0 in 2007. A N S I, C also leads to C 99 in 19 99. The second branch from C goes to C + + and Java in 19 94. Java lead to Java 5 point 0 2004, Java 6 point 0 in 2006, Java 7 point 0 in 2009, and Java 8 point 0 in 20 14. C + + also leads to C hash 2000 then to C hash 2 point 0 in 2006, C hash 3 point 0 in 2007, C hash 4 point 0 in 2009, C hash 5 point 0 in 20 12. In 19 59 L I S P was created. L I SP then inpired two product lines Scheme and M L. Scheme was made in 1975 and lead to commmon l i s p. M L was created in 19 78 an dlead to Miranda in 19 83 then Haskell 19 88. In 19 63 SNOBOL was created which lead to Icon in 19 82. Snowbol also lead to a w k in 19 78 with lead to Perl in 19 86. Pearl lead to P H P in 19 94 and Javascript in 19 96. Pearl also lead to Ruby in 19 94. Ruby lead to Ruby 1.8 in 2004 which lead to Ruby 1.9 in 2009.
In 19 57, Fortran 1 was created which lead to the frotran continuations as well as inspiring the ALGOL programing. In 19 58, Fortran 2 was created which lead to FORTRAN 4 in 1962. In 1978, Fortran 77. In 19 90 Fortran 90. In 19 95 Fortran 95. In 2003 Fortran 2003. In 2008 Fortran 2008. In 20 15 Fortran 2015.In 19 58, ALGOL 58 was inspired by Fortran 1. And continued in it’s series in 1960 with ALGOL 60. In 1966, ALGOL W which developed into pascal in 19 71, pascal in turn inspired 3 products, MODULA 2, M L, and Ada 83. On a different branch in 1968: ALGOL 68 was developed from ALGOL 60. C was partially inspired by ALGOL 68 in 19 71. ALGOL 60 inspired Basic in 19 64 which lead to quick basic in 19 88 then visual basic in 19 90 and finally visual basic dot NET in 2001. From Pascal in 19 71 there are 3 product lines, MODULA 2, M L, and Ada 83. MODULA 2 lead to oberon in 19 88. Modula 2 lead to Modula 3 in 19 88, Modela 3 partially inspired python in 19 92. In 1963, SIMULA 1 was created, inspired by ALGOL 60. In 1967, SIMULA 67. In 19 80 small talk 80. Small talk partially inspired both objective c in 19 84,which lead to swift in 20 14, and ruby in 19 94. Simula 67 along with Ada 83 both inspried Eiffel in 19 90. In 19 57 Flow matic was created which lead to COBOL in 19 60 then P L slash I in 19 64. In 19 62 C P L was created and lead to B C P L in 19 69. In 19 70 B was created which lead to C in 19 71. The line then splits into two branches. The first branch starts with A N S I, C or C 89 in 19 89 and goes on to Python in 19 91 then python 2 point 0 in 2000 and Python 3 point 0 in 2007. A N S I, C also leads to C 99 in 19 99. The second branch from C goes to C + + and Java in 19 94. Java lead to Java 5 point 0 2004, Java 6 point 0 in 2006, Java 7 point 0 in 2009, and Java 8 point 0 in 20 14. C + + also leads to C hash 2000 then to C hash 2 point 0 in 2006, C hash 3 point 0 in 2007, C hash 4 point 0 in 2009, C hash 5 point 0 in 20 12. In 19 59 L I S P was created. L I SP then inpired two product lines Scheme and M L. Scheme was made in 1975 and lead to commmon l i s p. M L was created in 19 78 an dlead to Miranda in 19 83 then Haskell 19 88. In 19 63 SNOBOL was created which lead to Icon in 19 82. Snowbol also lead to a w k in 19 78 with lead to Perl in 19 86. Pearl lead to P H P in 19 94 and Javascript in 19 96. Pearl also lead to Ruby in 19 94. Ruby lead to Ruby 1.8 in 2004 which lead to Ruby 1.9 in 2009.
The internal representation of the list A, B, C, D: A pointer points to the head node in the list. Four nodes with data parts A, B, C, and D and the next pointer linked to its successor. The node D does not have a successor and the pointer value is null. The internal representation of the list A, B, C, D, E, F, G: A pointer points to the head node A in the list. Node A is linked to Node B. Node B is linked to node D and Node C. The node C does not have a successor and the pointer value is null. Node D points to a dummy node for E with a null value. This node points to the node E which similarly points to a null node and points to Node F. This node F points to node G. The node G does not have a successor and the pointer value is null.
The internal representation of the list A, B, C, D: A pointer points to the head node in the list. Four nodes with data parts A, B, C, and D and the next pointer linked to its successor. The node D does not have a successor and the pointer value is null. The internal representation of the list A, B, C, D, E, F, G: A pointer points to the head node A in the list. Node A is linked to Node B. Node B is linked to node D and Node C. The node C does not have a successor and the pointer value is null. Node D points to a dummy node for E with a null value. This node points to the node E which similarly points to a null node and points to Node F. This node F points to node G. The node G does not have a successor and the pointer value is null.
The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, i d which represents B, asterisk, expression. The third level, expression, has 3 branches from left to right, left parenthesis, expression, right parenthesis. The fourth level, i d, which represents A, plus, expression. This expression denotes the i d which represents C.
The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, i d which represents B, asterisk, expression. The third level, expression, has 3 branches from left to right, left parenthesis, expression, right parenthesis. The fourth level, i d, which represents A, plus, expression. This expression denotes the i d which represents C.
The first tree has 3 levels. The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, expression which represents the i d with attribute B, asterisk, and expression. The third level, expression, has 3 branches from left to right, expression, which represents the i d with attribute C, asterisk, expression, which represents the i d with attribute A. Similarly the second tree has 3 levels. The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, expression, asterisk, expression which represents the i d with attribute A. The third level, expression, has 3 branches from left to right, expression, which represents the i d with attribute B, +, expression, which represents the i d with attribute C.
The first tree has 3 levels. The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, expression which represents the i d with attribute B, asterisk, and expression. The third level, expression, has 3 branches from left to right, expression, which represents the i d with attribute C, asterisk, expression, which represents the i d with attribute A. Similarly the second tree has 3 levels. The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, expression, asterisk, expression which represents the i d with attribute A. The third level, expression, has 3 branches from left to right, expression, which represents the i d with attribute B, +, expression, which represents the i d with attribute C.
The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, expression which which leads to the term and factor of i d with attribute B; plus; term. The third level, term, has 3 branches from left to right, term, which leads to the factor of i d with attribute C. the i d with attribute C; asterisk; factor of i d with attribute A.
The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right, expression which which leads to the term and factor of i d with attribute B; plus; term. The third level, term, has 3 branches from left to right, term, which leads to the factor of i d with attribute C. the i d with attribute C; asterisk; factor of i d with attribute A.
The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right; plus; term which leads to the factor of i d with attribute A. The third level, expression, has 3 branches from left to right, expression which which leads to the term and factor of i d with attribute B; plus; term, which leads to the factor of i d with attribute C.
The first level, assign, has 3 branches from left to right, i d which represents A, equals, expression. The second level, expression, has 3 branches from left to right; plus; term which leads to the factor of i d with attribute A. The third level, expression, has 3 branches from left to right, expression which which leads to the term and factor of i d with attribute B; plus; term, which leads to the factor of i d with attribute C.
First parse tree. The if statement in the first level can be expressed using 5 branches in the second level such as if, logic expression, statement, else, statement. The if statement has another if statement branch in level three which has 3 branches in the fourth level such as if, logic expression, statement. Second parse tree. The if statement in the first level can be expressed using 3 branches in the second level such as if, logic expression, statement. The statement has another if statement branch in level three and has 5 branches in the fourth level such as if, logic expression, statement, else, statement.
First parse tree. The if statement in the first level can be expressed using 5 branches in the second level such as if, logic expression, statement, else, statement. The if statement has another if statement branch in level three which has 3 branches in the fourth level such as if, logic expression, statement. Second parse tree. The if statement in the first level can be expressed using 3 branches in the second level such as if, logic expression, statement. The statement has another if statement branch in level three and has 5 branches in the fourth level such as if, logic expression, statement, else, statement.
The first level, assign, has 3 branches to the second level, from left to right, variable with attribute A; equals; expression. This Expression has 3 branches to the third level, from left to right, variable 2 with attribute A, +, variable 3 with attribute B.
The first level, assign, has 3 branches to the second level, from left to right, variable with attribute A; equals; expression. This Expression has 3 branches to the third level, from left to right, variable 2 with attribute A, +, variable 3 with attribute B.
The first level, assign, has 3 branches to the second level, from left to right, variable with attribute A; equals; expression. This Expression has 3 branches to the third level, from left to right, variable 2 with attribute A, +, variable 3 with attribute B. The attribute A flows from the actual type to the expected type. The attributes A and B flow from the actual type towards the expression.
The first level, assign, has 3 branches to the second level, from left to right, variable with attribute A; equals; expression. This Expression has 3 branches to the third level, from left to right, variable 2 with attribute A, +, variable 3 with attribute B. The attribute A flows from the actual type to the expected type. The attributes A and B flow from the actual type towards the expression.
The first level, assign, has 3 branches to the second level, from left to right, variable with attribute A; equals; expression. A note beside variable reads actual type = real type. This Expression has 3 branches to the third level, from left to right, variable 2 with attribute A, +, variable 3 with attribute B. A note beside expression reads expected type = real type, actual type = real type. Notes beside variable 2 and variable 3 reads, actual type = real type and actual type = integer type.
The first level, assign, has 3 branches to the second level, from left to right, variable with attribute A; equals; expression. A note beside variable reads actual type = real type. This Expression has 3 branches to the third level, from left to right, variable 2 with attribute A, +, variable 3 with attribute B. A note beside expression reads expected type = real type, actual type = real type. Notes beside variable 2 and variable 3 reads, actual type = real type and actual type = integer type.
The first level binary underscore number has two branches to the second level, bit 0 and binary number. Binary number has two branches to the third level, bit 1 and binary number. Binary number has a branch bit 1 in the fourth level.
The first level binary underscore number has two branches to the second level, bit 0 and binary number. Binary number has two branches to the third level, bit 1 and binary number. Binary number has a branch bit 1 in the fourth level.
The first level binary underscore number has two branches to the second level, bit 0 and binary number. The object 6 is attached to the first level. Binary number has two branches to the third level, bit 1 and binary number. The object 3 is attached to the second level. Binary number has a branch bit 1 in the fourth level. The object 1 is attached to the third level.
The first level binary underscore number has two branches to the second level, bit 0 and binary number. The object 6 is attached to the first level. Binary number has two branches to the third level, bit 1 and binary number. The object 3 is attached to the second level. Binary number has a branch bit 1 in the fourth level. The object 1 is attached to the third level.
The transitions and their transitional behavior are as follows: Start to i d, Letter, add c h a r, get c h a r; Start to i n t, Digit, add c h a r, get c h a r; unknown to done, t points to look up next c h a r, get c h a r. The state i d has a self transition and its behavior reads, letter or digit, add c h a r, get c h a r. The state i n t has a self transition and its behavior reads, digit, add c h a r, get c h a r. A transition from i d reads return lookup left parenthesis lexeme right parenthesis. A transition from i n t reads, return i n t underscore l i t. Similarly a transition from done state reads, return t.
The transitions and their transitional behavior are as follows: Start to i d, Letter, add c h a r, get c h a r; Start to i n t, Digit, add c h a r, get c h a r; unknown to done, t points to look up next c h a r, get c h a r. The state i d has a self transition and its behavior reads, letter or digit, add c h a r, get c h a r. The state i n t has a self transition and its behavior reads, digit, add c h a r, get c h a r. A transition from i d reads return lookup left parenthesis lexeme right parenthesis. A transition from i n t reads, return i n t underscore l i t. Similarly a transition from done state reads, return t.
The first level, expression leads to the second level, term. Term has 3 branches to the third level, from left to right, factor, slash, factor which leads to total. The first Factor has 3 branches to the fourth level, from left to right, left parenthesis, expression and right parenthesis. Expression has three branches from left to right, term to factor to sum, +, term to factor to 47.
The first level, expression leads to the second level, term. Term has 3 branches to the third level, from left to right, factor, slash, factor which leads to total. The first Factor has 3 branches to the fourth level, from left to right, left parenthesis, expression and right parenthesis. Expression has three branches from left to right, term to factor to sum, +, term to factor to 47.
L R parser has two components Parser code and parser table. The parser code is connected to the parser stack and the parsing table is connected to the input. The parse stack contains the following elements: S sub 0, X sub 1, S sub 1, incomplete list, X sub m, S sub m. A pointer labeled top points towards S sub m. The input coming from parsing table consists of the following elements a sub i, a sub i + 1, blank, incomplete list, a sub n and dollar sign.
L R parser has two components Parser code and parser table. The parser code is connected to the parser stack and the parsing table is connected to the input. The parse stack contains the following elements: S sub 0, X sub 1, S sub 1, incomplete list, X sub m, S sub m. A pointer labeled top points towards S sub m. The input coming from parsing table consists of the following elements a sub i, a sub i + 1, blank, incomplete list, a sub n and dollar sign.
| state | action, i d | action, + | action, asterisk | action, left parenthesis | action, right parenthesis | action, dollar sign | go to, E | go to, T | go to, F |
| 0 | S 5 | blank | blank | S 4 | blank | blank | 1 | 2 | 3 |
| 1 | blank | S 6 | blank | blank | blank | accept | blank | blank | blank |
| 2 | blank | R 2 | S 7 | blank | R 2 | R 2 | blank | blank | blank |
| 3 | blank | R 4 | R 4 | blank | R 4 | R 4 | blank | blank | blank |
| 4 | S 5 | blank | blank | S 4 | blank | blank | 8 | 2 | 3 |
| 5 | blank | R 6 | R 6 | blank | R 6 | R 6 | blank | blank | blank |
| 6 | S 5 | blank | blank | S 4 | blank | blank | blank | 9 | 3 |
| 7 | S 5 | blank | blank | S 4 | blank | blank | blank | blank | 10 |
| 8 | blank | S 6 | blank | blank | S 11 | blank | blank | blank | blank |
| 9 | blank | R 1 | blank | S 7 | R 1 | R 1 | blank | blank | blank |
| 10 | blank | R 3 | blank | R 3 | R 3 | R 3 | blank | blank | blank |
| 11 | blank | R 5 | blank | R 5 | R 5 | R 5 | blank | blank | blank |
The single precision format is represented using a 1 by 3 grid where the grids represent the following: sign bit, exponent in 8 bits and fraction in 23 bits. The double precision format is represented using a 1 by 3 where the grids represent the following: sign bit, exponent in 11 bits and fraction in 52 bits.
The single precision format is represented using a 1 by 3 grid where the grids represent the following: sign bit, exponent in 8 bits and fraction in 23 bits. The double precision format is represented using a 1 by 3 where the grids represent the following: sign bit, exponent in 11 bits and fraction in 52 bits.
The column values of the matrix table are: 0, 1, incomplete, j minus 1, j, incomplete, n minus 1. The row values of the matrix table are: 0, 1, incomplete, i minus 1, i, incomplete, m minus 1. The location of i, j is the cell where the column j and row i meet and is marked using a circle with a x in it.
The column values of the matrix table are: 0, 1, incomplete, j minus 1, j, incomplete, n minus 1. The row values of the matrix table are: 0, 1, incomplete, i minus 1, i, incomplete, m minus 1. The location of i, j is the cell where the column j and row i meet and is marked using a circle with a x in it.
The fields read: record, name, type, incomplete, , name, type, offset and address. There are two sets called field one and field n. Field 1 includes name, type, and offset. Field n includes name, type, and offset.
The fields read: record, name, type, incomplete, , name, type, offset and address. There are two sets called field one and field n. Field 1 includes name, type, and offset. Field n includes name, type, and offset.
The p t r pointer with value 7080 points to the block with reference address 7080. This address block holds the value 206. A note beside this block reads an anonymous dynamic variable. This block points towards the j block and the value 206 is assigned to j.
The p t r pointer with value 7080 points to the block with reference address 7080. This address block holds the value 206. A note beside this block reads an anonymous dynamic variable. This block points towards the j block and the value 206 is assigned to j.
A pointer from block r points towards action 1. Action 1 leads to action 2 and 7. Action 2 leads to action 3 and 6. Action 3 leads to action 4 and 5. Action 7 leads to action 8 and 10. Action 8 leads to action 9 and action 10 lead to actions 11 and 12.The order of the node marking is represented by creating an outline around the actions using dashed lines with an x at each step.
A pointer from block r points towards action 1. Action 1 leads to action 2 and 7. Action 2 leads to action 3 and 6. Action 3 leads to action 4 and 5. Action 7 leads to action 8 and 10. Action 8 leads to action 9 and action 10 lead to actions 11 and 12.The order of the node marking is represented by creating an outline around the actions using dashed lines with an x at each step.
The value of the caller is sub left parenthesis a, b, c right parenthesis and the value of callee is void sub left parenthesis i n t, x, i n t, y, i n t z right parenthesis. The first semantic model displays the caller parameter a, being passed to the callee parameter x, the arrow is labeled call. This is an in mode transmission. The second semantic model displays the callee parameter y being returned to the caller parameter b, the arrow is labeled return. This is an out mode transmission. The third semantic model displays the caller parameter c being passed to the callee parameter z, the arrow is labeled call and the callee parameter z passed back to the caller parameter c, the arrow is labeled return. This is an in out mode transmission.
The value of the caller is sub left parenthesis a, b, c right parenthesis and the value of callee is void sub left parenthesis i n t, x, i n t, y, i n t z right parenthesis. The first semantic model displays the caller parameter a, being passed to the callee parameter x, the arrow is labeled call. This is an in mode transmission. The second semantic model displays the callee parameter y being returned to the caller parameter b, the arrow is labeled return. This is an out mode transmission. The third semantic model displays the caller parameter c being passed to the callee parameter z, the arrow is labeled call and the callee parameter z passed back to the caller parameter c, the arrow is labeled return. This is an in out mode transmission.
Three stacks main, stack and function sub with the following stack values: main, w, x, y, z and code; stack, value of a, value of b, value of c and address d; function sub, r e f to a, assign to b, r e f to c, assign to c and r e f to d; This is together grouped as code. The stack operations for the parameter passing methods are as follows: The main stack values w and y are passed to the stack value of a, and value of c. The address of at start is passed to the address d which is returned back to the main stack. The value of b and value of c are returned back to the main stack values x and y. The function sub parameters r e f to a, assign to b, r e f to c, assign to c and r e f to d are passed on to the stack values value of a, value of b, value of c, value of c and address of d respectively.
Three stacks main, stack and function sub with the following stack values: main, w, x, y, z and code; stack, value of a, value of b, value of c and address d; function sub, r e f to a, assign to b, r e f to c, assign to c and r e f to d; This is together grouped as code. The stack operations for the parameter passing methods are as follows: The main stack values w and y are passed to the stack value of a, and value of c. The address of at start is passed to the address d which is returned back to the main stack. The value of b and value of c are returned back to the main stack values x and y. The function sub parameters r e f to a, assign to b, r e f to c, assign to c and r e f to d are passed on to the stack values value of a, value of b, value of c, value of c and address of d respectively.
Illustration a. Two routines A and B where A receives the input, resume, from master. The routine A consists of sequences, resume B and routine B consists of sequences, resume A. The control transfers from resume B to resume A and back from resume A to resume B. The sequence continues. Illustration b. Two routines A and B where B receives the input, resume, from master. The routine A consists of sequences, resume B and routine B consists of sequences, resume A. The control transfers from resume B to resume A and back from resume A to resume B. The sequence continues.
Illustration a. Two routines A and B where A receives the input, resume, from master. The routine A consists of sequences, resume B and routine B consists of sequences, resume A. The control transfers from resume B to resume A and back from resume A to resume B. The sequence continues. Illustration b. Two routines A and B where B receives the input, resume, from master. The routine A consists of sequences, resume B and routine B consists of sequences, resume A. The control transfers from resume B to resume A and back from resume A to resume B. The sequence continues.
Two routines A and B where A receives the input, resume, from master. The routine A consists of sequences, resume B and routine B consists of sequences, resume A. The control transfers from resume B to resume A and back from resume A to resume B. The sequences loop back to the first sequence when it gets completed. The control from resume B to resume A is the first resume and the subsequent resume is the control from resume A to resume B.
Two routines A and B where A receives the input, resume, from master. The routine A consists of sequences, resume B and routine B consists of sequences, resume A. The control transfers from resume B to resume A and back from resume A to resume B. The sequences loop back to the first sequence when it gets completed. The control from resume B to resume A is the first resume and the subsequent resume is the control from resume A to resume B.
Data has 1 cell stack in the main sub section and three cell stacks each in the lettered sections. The main section information reads local variables. Section A reads local variables, parameters, and return address. Section B reads local variables, parameters, and return address. Section C reads local variables, parameters, and return address.
Data has 1 cell stack in the main sub section and three cell stacks each in the lettered sections. The main section information reads local variables. Section A reads local variables, parameters, and return address. Section B reads local variables, parameters, and return address. Section C reads local variables, parameters, and return address.
From top to bottom, next to the first step reads sum. Next to the second step reads list 4. Next to the third step reads step 3. Next to the fourth step reads step 2. Next to the fifth step reads step 1. Next to the sixth step reads step 0. Next to the seventh step reads part. Next to the eighth step reads total.
From top to bottom, next to the first step reads sum. Next to the second step reads list 4. Next to the third step reads step 3. Next to the fourth step reads step 2. Next to the fifth step reads step 1. Next to the sixth step reads step 0. Next to the seventh step reads part. Next to the eighth step reads total.
At point one there are 2 sections, A R I which means activation record instance for fun 1 and A R I for main. There are 5 cell stacks in fun 1 reading, local, local, parameter, dynamic link and return to main. There is one cell stack in main which reads local. There are variables notated beside the stacks. Next to the top level is t, next to the second level is s, and next to the third level is r. The main section is labeled with variable p. A line runs from dynamic link to the bottom of the main section or sixth step. At point 2 there are 3 sections, A R I for fun 2, A R I for fun 1, and A R I for main. There are 4 cell stacks in fun 2 which read, local, parameter, dynamic link, return to fun 1. There are 5 cell stacks in fun 1 which read, local, local, parameter, dynamic link and return to main. There is one cell stack in main which reads local. There are variables notated beside the stacks. Next to the top level is y and the one below it x. The fifth level variable is t. Next to the sixth level is s, and next to the seventh level is r. Next to the bottom or tenth level is p. Two lines run from both the third line to the top of the main or bottom level and from the eighth level to the bottom of the main or bottom level. At point 3 there are 4 sections, A R I for fun 3, A R I for fun 2, A R I for fun 1, and A R I for main. There are 2 cell stacks in fun 3 which read, parameter and dynamic link. There are 4 cell stacks in fun 2 which read, local, parameter, dynamic link, return to fun 1. There are 5 cell stacks in fun 1 which read, local, local, parameter, dynamic link and return to main. There is one cell stack in main which reads local. There are variables notated beside the stacks. In section fun 3 variable q is next to the parameter level. In section fun 2 variable y is next to local level and variable x is next to level parameter. In section fun 1 variable t is next to the first local level and variable s is next to the second local level. In section main, the variable p is next to the only level. Three lines run from the dynamic link level in section 3 to the bottom of section fun 2, the dynamic link level in section fun 2 to the top of the main or bottom level, and from the dynamic link level in fun 1 to the bottom of the main or bottom level.
At point one there are 2 sections, A R I which means activation record instance for fun 1 and A R I for main. There are 5 cell stacks in fun 1 reading, local, local, parameter, dynamic link and return to main. There is one cell stack in main which reads local. There are variables notated beside the stacks. Next to the top level is t, next to the second level is s, and next to the third level is r. The main section is labeled with variable p. A line runs from dynamic link to the bottom of the main section or sixth step. At point 2 there are 3 sections, A R I for fun 2, A R I for fun 1, and A R I for main. There are 4 cell stacks in fun 2 which read, local, parameter, dynamic link, return to fun 1. There are 5 cell stacks in fun 1 which read, local, local, parameter, dynamic link and return to main. There is one cell stack in main which reads local. There are variables notated beside the stacks. Next to the top level is y and the one below it x. The fifth level variable is t. Next to the sixth level is s, and next to the seventh level is r. Next to the bottom or tenth level is p. Two lines run from both the third line to the top of the main or bottom level and from the eighth level to the bottom of the main or bottom level. At point 3 there are 4 sections, A R I for fun 3, A R I for fun 2, A R I for fun 1, and A R I for main. There are 2 cell stacks in fun 3 which read, parameter and dynamic link. There are 4 cell stacks in fun 2 which read, local, parameter, dynamic link, return to fun 1. There are 5 cell stacks in fun 1 which read, local, local, parameter, dynamic link and return to main. There is one cell stack in main which reads local. There are variables notated beside the stacks. In section fun 3 variable q is next to the parameter level. In section fun 2 variable y is next to local level and variable x is next to level parameter. In section fun 1 variable t is next to the first local level and variable s is next to the second local level. In section main, the variable p is next to the only level. Three lines run from the dynamic link level in section 3 to the bottom of section fun 2, the dynamic link level in section fun 2 to the top of the main or bottom level, and from the dynamic link level in fun 1 to the bottom of the main or bottom level.
At first call there are 2 sections, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is local question mark. The variable n is next to parameter 3 in the first A R I. The top line of first A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At second call there are 3 sections, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for second A R I for factorial, functional value question mark, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 2 in the second A R I and parameter 3 in the first A R I. The top line of second A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At third call there are 4 sections, third A R I for factorial, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for third A R I for factorial, functional value question mark, parameter 1, dynamic link with a line to the bottom of the second A R I section, and return to factorial. There are 4 cell stacks for second A R I for factorial, functional value question mark, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 1 in the third A R I, parameter 2 in the second A R I, and parameter 3 in the first A R I. The top line of third A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level.
At first call there are 2 sections, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is local question mark. The variable n is next to parameter 3 in the first A R I. The top line of first A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At second call there are 3 sections, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for second A R I for factorial, functional value question mark, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 2 in the second A R I and parameter 3 in the first A R I. The top line of second A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At third call there are 4 sections, third A R I for factorial, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for third A R I for factorial, functional value question mark, parameter 1, dynamic link with a line to the bottom of the second A R I section, and return to factorial. There are 4 cell stacks for second A R I for factorial, functional value question mark, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 1 in the third A R I, parameter 2 in the second A R I, and parameter 3 in the first A R I. The top line of third A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level.
At position 2 in factorial third call completed there are 4 sections, third A R I for factorial, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for third A R I for factorial, functional value 1, parameter 1, dynamic link with a line to the bottom of the second A R I section, and return to factorial. There are 4 cell stacks for second A R I for factorial, functional value question mark, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 1 in the third A R I, parameter 2 in the second A R I, and parameter 3 in the first A R I. The top line of third A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At position 2 in factorial second call completed call there are 3 sections, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for second A R I for factorial, functional value 2, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 2 in the second A R I and parameter 3 in the first A R I. The top line of second A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At position 2 in factorial first call completed, there are 2 sections, First A R I for factorial and A R I for main. There are 4 cell stacks for first A R I for factorial, functional value 6, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is local question mark. The variable n is next to parameter 3 in the first A R I. The top line of first A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. In position 3 in main final results. There is one cell stack for A R I for main which is local question mark. The top line of first A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level.
At position 2 in factorial third call completed there are 4 sections, third A R I for factorial, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for third A R I for factorial, functional value 1, parameter 1, dynamic link with a line to the bottom of the second A R I section, and return to factorial. There are 4 cell stacks for second A R I for factorial, functional value question mark, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 1 in the third A R I, parameter 2 in the second A R I, and parameter 3 in the first A R I. The top line of third A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At position 2 in factorial second call completed call there are 3 sections, second A R I for factorial, First A R I for factorial which means first activation record instance for factorial and A R I for main. There are 4 cell stacks for second A R I for factorial, functional value 2, parameter 2, dynamic link with a line to the bottom of first A R I section, and return to factorial. There are 4 cell stacks for first A R I for factorial, functional value question mark, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is, local question mark. The variable n is next to parameter 2 in the second A R I and parameter 3 in the first A R I. The top line of second A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. At position 2 in factorial first call completed, there are 2 sections, First A R I for factorial and A R I for main. There are 4 cell stacks for first A R I for factorial, functional value 6, parameter 3, dynamic link with line to the bottom of A R I main, and return to main. There is one cell stack for A R I for main which is local question mark. The variable n is next to parameter 3 in the first A R I. The top line of first A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level. In position 3 in main final results. There is one cell stack for A R I for main which is local question mark. The top line of first A R I is pointed to by an arrow labeled top. The word value is next to the main or bottom most level.
There are 5 sections labeled A R I for sub 1, A R I for sub 3, A R I for sub 2, A R I for big sub, A R I for main underscore 2. There is a label at the top reading top. Section sub 1 has 5 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 3, static link which has a dotted line that goes to the bottom of big sub, and return to sub 3. The variable d is next to the top local and the variable a is next to the second local. Section sub 3 has 5 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 2, static link which has a dotted line that goes to the bottom of big sub, and return to sub 2. The variable e is next to the top local and the variable c is next to the second local. Section sub 2 has 5 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub big sub, static link which has a dotted line that goes to the bottom of big sub, and return to sub big sub. The variable e is next to the top local, the variable b is next to the second local, and the variable x is next to parameter. Section big sub has 6 cell stacks, local, local, local, dynamic link which has a line that goes to the bottom of main, static link, and return to main. The variable c is next to the top local, the variable b is next to the second local, and the variable a is next to the third local. Section main has 1 cell stacks, local. The variable x is next to the local.
There are 5 sections labeled A R I for sub 1, A R I for sub 3, A R I for sub 2, A R I for big sub, A R I for main underscore 2. There is a label at the top reading top. Section sub 1 has 5 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 3, static link which has a dotted line that goes to the bottom of big sub, and return to sub 3. The variable d is next to the top local and the variable a is next to the second local. Section sub 3 has 5 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 2, static link which has a dotted line that goes to the bottom of big sub, and return to sub 2. The variable e is next to the top local and the variable c is next to the second local. Section sub 2 has 5 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub big sub, static link which has a dotted line that goes to the bottom of big sub, and return to sub big sub. The variable e is next to the top local, the variable b is next to the second local, and the variable x is next to parameter. Section big sub has 6 cell stacks, local, local, local, dynamic link which has a line that goes to the bottom of main, static link, and return to main. The variable c is next to the top local, the variable b is next to the second local, and the variable a is next to the third local. Section main has 1 cell stacks, local. The variable x is next to the local.
Three unlabeled levels proceed the block variables levels of e, d, c, b and g, and a and f. Locals levels read, z, y, and x. The bottom most level is a large shaded black that reads, activation record instance for main.
Three unlabeled levels proceed the block variables levels of e, d, c, b and g, and a and f. Locals levels read, z, y, and x. The bottom most level is a large shaded black that reads, activation record instance for main.
There are 5 sections labeled A R I for sub 3, A R I for sub 2, A R I for sub 1, A R I for sub 1, A R I for main. Section sub 3 has 4 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 2, and return to sub 2. The variable z is next to the top local and the variable x is next to the second local. Section sub 2 has 4 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 1, and return to sub 1. The variable w is next to the top local and the variable v is next to the second local. Section sub 1 has 4 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 1, and return to main. The variable w is next to the top local and the variable v is next to the second local. Section main has 2 cell stacks, local and local. The variable u is next to the top local and the variable v is next to the second local.
There are 5 sections labeled A R I for sub 3, A R I for sub 2, A R I for sub 1, A R I for sub 1, A R I for main. Section sub 3 has 4 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 2, and return to sub 2. The variable z is next to the top local and the variable x is next to the second local. Section sub 2 has 4 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 1, and return to sub 1. The variable w is next to the top local and the variable v is next to the second local. Section sub 1 has 4 cell stacks, local, local, dynamic link which has a line that goes to the bottom of sub 1, and return to main. The variable w is next to the top local and the variable v is next to the second local. Section main has 2 cell stacks, local and local. The variable u is next to the top local and the variable v is next to the second local.
There are 5 stacks that are labeled, u, v, x, z, and w. The names in the stack cells indicate the program units of the variable declaration. U has one cell stack labeled main. V has three cell stacks labeled, sub 1, sub 1, main. X has two cell stacks labeled, sub 3 and sub 2. Z has one cell stack labeled sub 3. W has three cell stacks labeled, sub 2, sub 1, sub 1.
There are 5 stacks that are labeled, u, v, x, z, and w. The names in the stack cells indicate the program units of the variable declaration. U has one cell stack labeled main. V has three cell stacks labeled, sub 1, sub 1, main. X has two cell stacks labeled, sub 3 and sub 2. Z has one cell stack labeled sub 3. W has three cell stacks labeled, sub 2, sub 1, sub 1.
The first block is the Public class A consists of a method draw and an incomplete line of code. The second block is the Public class B extends A consists of a draw method and an incomplete line of code. This block has an upward arrow pointing towards the first block. The client block consists of the following code. Line 1. A my A = new A (); Line 2. my A period draw (); Line 3. Incomplete line of code.
The first block is the Public class A consists of a method draw and an incomplete line of code. The second block is the Public class B extends A consists of a draw method and an incomplete line of code. This block has an upward arrow pointing towards the first block. The client block consists of the following code. Line 1. A my A = new A (); Line 2. my A period draw (); Line 3. Incomplete line of code.
A stack with value variables b 1 and a 1 pointing towards its corresponding objects. variable b 1 points towards its objects X and Y. This is labeled data area. Variable a 1 points towards its object X. This is labeled data area.
A stack with value variables b 1 and a 1 pointing towards its corresponding objects. variable b 1 points towards its objects X and Y. This is labeled data area. Variable a 1 points towards its object X. This is labeled data area.
The first illustration has three classes Shape, circle and rectangle with values virtual void draw left parenthesis right parenthesis = 0, void draw left parenthesis right parenthesis and void draw left parenthesis right parenthesis. The classes circle and shape are inherited from the class shape. Similarly class square can also be inherited. The second illustration displays the type of the pointers and it’s binding to objects. The shape asterisk pointer is represented as a rectangular block with reference p t r underscore shape binded towards its object square with value void draw left parenthesis right parenthesis. Similarly, The rectangle asterisk pointer is represented as a rectangular block with reference r e c t binded towards its object rectangle with value void draw left parenthesis right parenthesis. The binding is represented using a continuous arrow pointing towards the block. The square asterisk pointer is represented using a dot.
The first illustration has three classes Shape, circle and rectangle with values virtual void draw left parenthesis right parenthesis = 0, void draw left parenthesis right parenthesis and void draw left parenthesis right parenthesis. The classes circle and shape are inherited from the class shape. Similarly class square can also be inherited. The second illustration displays the type of the pointers and it’s binding to objects. The shape asterisk pointer is represented as a rectangular block with reference p t r underscore shape binded towards its object square with value void draw left parenthesis right parenthesis. Similarly, The rectangle asterisk pointer is represented as a rectangular block with reference r e c t binded towards its object rectangle with value void draw left parenthesis right parenthesis. The binding is represented using a continuous arrow pointing towards the block. The square asterisk pointer is represented using a dot.
The class instance record for A is represented using a virtual method table with 3 fields, v table pointer, a and b pointing towards a v table for A with two fields. The two fields have pointers towards, code for A’s draw and code for A’s area. Similarly, the class instance record for B is represented using a virtual method table with 5 fields, v table pointer, a, b, c and d pointing towards a v table for B with three fields. The three fields have pointers towards, code for A’s area, code for B’s draw and code for B’sift.
The class instance record for A is represented using a virtual method table with 3 fields, v table pointer, a and b pointing towards a v table for A with two fields. The two fields have pointers towards, code for A’s draw and code for A’s area. Similarly, the class instance record for B is represented using a virtual method table with 5 fields, v table pointer, a, b, c and d pointing towards a v table for B with three fields. The three fields have pointers towards, code for A’s area, code for B’s draw and code for B’sift.
The class instance record for C is represented using a virtual method table with 5 fields, v table pointer, a, v table pointer, b and c. The first v table pointer points to C’s v table for C and A part with 3 fields. The three fields have pointers towards, code for C’s in it, code for C’s fun and code for C’s d u d. The second v table pointer points to C’s v table for B part with one field that points towards code for B’s sum.
The class instance record for C is represented using a virtual method table with 5 fields, v table pointer, a, v table pointer, b and c. The first v table pointer points to C’s v table for C and A part with 3 fields. The three fields have pointers towards, code for C’s in it, code for C’s fun and code for C’s d u d. The second v table pointer points to C’s v table for B part with one field that points towards code for B’s sum.
The value of total 3 is represented by a continuous line. Values 4 and 6 are written on the line. Task A is presented by a continuous line with markings Fetch TOTAL, Add 1 and Store TOTAL. Similarly, Task B is represented by a continuous line with markings Fetch TOTAL, Multiply by 2 and Store TOTAL. Time is represented by a continuous arrow pointing towards the right.
The value of total 3 is represented by a continuous line. Values 4 and 6 are written on the line. Task A is presented by a continuous line with markings Fetch TOTAL, Add 1 and Store TOTAL. Similarly, Task B is represented by a continuous line with markings Fetch TOTAL, Multiply by 2 and Store TOTAL. Time is represented by a continuous arrow pointing towards the right.
The flow of the states is as follows: A new state is created; The state then becomes ready with a task; The scheduled task begins running The task with time slice expiration returns back to the ready state; The completed task is sent to the dead state; The input or output from the running state is sent to the blocked state which, when completed goes back to the ready state.
The flow of the states is as follows: A new state is created; The state then becomes ready with a task; The scheduled task begins running The task with time slice expiration returns back to the ready state; The completed task is sent to the dead state; The input or output from the running state is sent to the blocked state which, when completed goes back to the ready state.
The processes SUB 1 and SUB 3 are given to the insertion block for inserting data and the removed data is fed back to the processes. Insert sends information to buffer and remove received information from buffer.
The processes SUB 1 and SUB 3 are given to the insertion block for inserting data and the removed data is fed back to the processes. Insert sends information to buffer and remove received information from buffer.
The first timeline diagram depicts the occurrence, task example waits for sender. The timeline is represented using a continuous arrow pointing towards the right. The task example consists of the following steps in the time line: Wait at accept, Accept and wait at accept. Wait at accept is represented by dashed lines. The sender consists of the following steps, sends message, rendezvous and continue execution. The rendezvous section is represented in dashed lines. The second timeline diagram depicts the occurrence, sender waits for task example. The timeline is represented using a continuous arrow pointing towards the right. The task example consists of the following steps in the time line: Busy, Accept and wait at accept. Wait at accept is represented by dashed lines. The sender consists of the following steps, sends message and is suspended, rendezvous and continue execution. The rendezvous section is represented in dashed lines.
The first timeline diagram depicts the occurrence, task example waits for sender. The timeline is represented using a continuous arrow pointing towards the right. The task example consists of the following steps in the time line: Wait at accept, Accept and wait at accept. Wait at accept is represented by dashed lines. The sender consists of the following steps, sends message, rendezvous and continue execution. The rendezvous section is represented in dashed lines. The second timeline diagram depicts the occurrence, sender waits for task example. The timeline is represented using a continuous arrow pointing towards the right. The task example consists of the following steps in the time line: Busy, Accept and wait at accept. Wait at accept is represented by dashed lines. The sender consists of the following steps, sends message and is suspended, rendezvous and continue execution. The rendezvous section is represented in dashed lines.
Task A has two jobs 1 and 2 and task B has two jobs 3 and 4, labeled, accept clauses. Both tasks consist of a task body connected to each other by a bidirectional arrow. A note beside reads, B period Job 3 value.
Task A has two jobs 1 and 2 and task B has two jobs 3 and 4, labeled, accept clauses. Both tasks consist of a task body connected to each other by a bidirectional arrow. A note beside reads, B period Job 3 value.
The flow transfers from the executing code to the exception handlers. The executing code consists of the following: Line 1, incomplete. Line 2, begin. Line 3, incomplete. Line 4, some statement. Line 5, incomplete. Line 6, end semicolon. Line 7, incomplete. The exception handlers consist of the following: Line 1, when incomplete line of code. Line 2. begin. Line 3.incomplete. Line 4, end semicolon. The exception is raised in, some statement. The control exception to handler binding, flows from some statement to the when statements. The when then continues back to the some statement in the executing code, the end, two incomplete sections, and a termination on the side.
The flow transfers from the executing code to the exception handlers. The executing code consists of the following: Line 1, incomplete. Line 2, begin. Line 3, incomplete. Line 4, some statement. Line 5, incomplete. Line 6, end semicolon. Line 7, incomplete. The exception handlers consist of the following: Line 1, when incomplete line of code. Line 2. begin. Line 3.incomplete. Line 4, end semicolon. The exception is raised in, some statement. The control exception to handler binding, flows from some statement to the when statements. The when then continues back to the some statement in the executing code, the end, two incomplete sections, and a termination on the side.
The internal representation of the list A, B, C, D: A pointer points to the head node in the list. Four nodes with data parts A, B C and D and the next pointer linked to its successor. The node D does not have a successor and the pointer value is null. The internal representation of the list A left parenthesis B C right parenthesis D left parenthesis E left parenthesis F G right parenthesis right parenthesis: A pointer points to the head node A in the list. Node A is linked to Node B. Node B is linked to node D and Node C. The node C does not have a successor and the pointer value is null. Node D points to a dummy node for E with a null value. This node points to the node E which similarly points to a null node and points to Node F. This node F points to node G. The node G does not have a successor and the pointer value is null.
The internal representation of the list A, B, C, D: A pointer points to the head node in the list. Four nodes with data parts A, B C and D and the next pointer linked to its successor. The node D does not have a successor and the pointer value is null. The internal representation of the list A left parenthesis B C right parenthesis D left parenthesis E left parenthesis F G right parenthesis right parenthesis: A pointer points to the head node A in the list. Node A is linked to Node B. Node B is linked to node D and Node C. The node C does not have a successor and the pointer value is null. Node D points to a dummy node for E with a null value. This node points to the node E which similarly points to a null node and points to Node F. This node F points to node G. The node G does not have a successor and the pointer value is null.
The internal representation of lisp list C O N S single quote A single quote left parenthesis right parenthesis which after construct operation becomes A. A head node with data part value A and null pointer value. The internal representation of lisp list C O N S single quote A single quote left parenthesis B C right parenthesis which after construct operation becomes A B C. Three nodes with data parts A, B and C and the next pointer linked to its successor. The node C does not have a successor and the pointer value is null. The internal representation of lisp list C O N S single quote left parenthesis right parenthesis single quote left parenthesis A Bright parenthesis which after construct operation becomes left parenthesis right parenthesis A B. A head node with data part value pointing towards another node with data part NIL. The head node points to a node with data part A and this node points to a node with data part B with a null pointer. The internal representation of lisp list C O N S single quote A B single quote left parenthesis C D right parenthesis which after construct operation becomes left parenthesis A B right parenthesis C D. The head node points to two nodes with data parts A and C. Node C points to a node with data part D and with a null pointer value. Node A points to a node with data part B and with a null pointer value.
The internal representation of lisp list C O N S single quote A single quote left parenthesis right parenthesis which after construct operation becomes A. A head node with data part value A and null pointer value. The internal representation of lisp list C O N S single quote A single quote left parenthesis B C right parenthesis which after construct operation becomes A B C. Three nodes with data parts A, B and C and the next pointer linked to its successor. The node C does not have a successor and the pointer value is null. The internal representation of lisp list C O N S single quote left parenthesis right parenthesis single quote left parenthesis A Bright parenthesis which after construct operation becomes left parenthesis right parenthesis A B. A head node with data part value pointing towards another node with data part NIL. The head node points to a node with data part A and this node points to a node with data part B with a null pointer. The internal representation of lisp list C O N S single quote A B single quote left parenthesis C D right parenthesis which after construct operation becomes left parenthesis A B right parenthesis C D. The head node points to two nodes with data parts A and C. Node C points to a node with data part D and with a null pointer value. Node A points to a node with data part B and with a null pointer value.
Each goal is depicted by a rectangular box with four ports call, fail, exit and redo. The control enters the block through the call port or the redo port. If the goal succeeds, the control leaves through the exit port. If the goal fails, the control leaves through the fail port. If two sections are connected as in the example the redo and fail sections connect and the exit and call sections connect.
Each goal is depicted by a rectangular box with four ports call, fail, exit and redo. The control enters the block through the call port or the redo port. If the goal succeeds, the control leaves through the exit port. If the goal fails, the control leaves through the fail port. If two sections are connected as in the example the redo and fail sections connect and the exit and call sections connect.
The table has 9 rows and 4 columns. The columns have the following headings from left to right. Characteristic, Readability criteria, Writability criteria, Reliability criteria. The row entries are as follows. Row 1. Characteristic, Simplicity. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 2. Characteristic, Orthogonality. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 3. Characteristic, Data types. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 4. Characteristic, Syntax design. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 5. Characteristic, Support for abstraction. Readability criteria, no. Writability criteria, yes. Reliability criteria, yes. Row 6. Characteristic, Expressivity. Readability criteria, no. Writability criteria, yes. Reliability criteria, yes. Row 7. Characteristic, Type checking. Readability criteria, no. Writability criteria, no. Reliability criteria, yes. Row 8. Characteristic, Exception handling. Readability criteria, no. Writability criteria, no. Reliability criteria, yes. Row 9. Characteristic, Restricted aliasing. Readability criteria, no. Writability criteria, no. Reliability criteria, yes.
The table has 9 rows and 4 columns. The columns have the following headings from left to right. Characteristic, Readability criteria, Writability criteria, Reliability criteria. The row entries are as follows. Row 1. Characteristic, Simplicity. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 2. Characteristic, Orthogonality. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 3. Characteristic, Data types. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 4. Characteristic, Syntax design. Readability criteria, yes. Writability criteria, yes. Reliability criteria, yes. Row 5. Characteristic, Support for abstraction. Readability criteria, no. Writability criteria, yes. Reliability criteria, yes. Row 6. Characteristic, Expressivity. Readability criteria, no. Writability criteria, yes. Reliability criteria, yes. Row 7. Characteristic, Type checking. Readability criteria, no. Writability criteria, no. Reliability criteria, yes. Row 8. Characteristic, Exception handling. Readability criteria, no. Writability criteria, no. Reliability criteria, yes. Row 9. Characteristic, Restricted aliasing. Readability criteria, no. Writability criteria, no. Reliability criteria, yes.
The table has 7 rows and 6 columns. The columns have the following headings from left to right. design issue of language, small talk, C + +, java, c hash, ruby. The row entries are as follows. Row 1. design issue of language, exclusivity of objects. small talk, all data are objects. C + +, primitive types plus objects. java, primitive types plus objects. c hash, primitive types plus objects. ruby, all data are objects. Row 2. design issue of language, are subclasses sub types. small talk, they can be and usually are. C + +, they can be and usually are if the derivation is public. java, they can be and usually are. c hash, they can be and usually are. ruby, no subclasses are sub types. Row 3. design issue of language, single and multiple inheritance. small talk, single only. C + +, both. java, single only but some effects with interfaces. c hash, single only but some effects with interfaces. ruby, single only but some effects with modules. Row 4. design issue of language, allocation and deallocation of objects. small talk, all objects are heap allocated, allocation is explicit and deallocation is implicit. C + +, objects can be static, stack dynamic, or heap dynamic, allocation and deallocation are explicit. java, all objects are heap dynamic, allocation is explicit and deallocation is implicit. c hash, all objects are heap dynamic allocation is explicit and deallocation is implicit. ruby, all objects are heap dynamic, allocation is explicit and deallocation is implicit. Row 5. design issue of language, dynamic and static binding. small talk, all method bindings are dynamic. C + +, method binding can be either. java, method binding can be either. c hash, method binding can be either. ruby, All method bindings are dynamic. Row 6. design issue of language, Nested classes. small talk, no. C + +, yes. java, yes. c hash, yes. ruby, yes. Row 7. design issue of language, Initialization. small talk, constructors must be explicitly called. C + +, constructors are explicitly called. java, constructors are explicitly called. c hash, constructors are explicitly called. ruby, constructors are explicitly called.
The table has 7 rows and 6 columns. The columns have the following headings from left to right. design issue of language, small talk, C + +, java, c hash, ruby. The row entries are as follows. Row 1. design issue of language, exclusivity of objects. small talk, all data are objects. C + +, primitive types plus objects. java, primitive types plus objects. c hash, primitive types plus objects. ruby, all data are objects. Row 2. design issue of language, are subclasses sub types. small talk, they can be and usually are. C + +, they can be and usually are if the derivation is public. java, they can be and usually are. c hash, they can be and usually are. ruby, no subclasses are sub types. Row 3. design issue of language, single and multiple inheritance. small talk, single only. C + +, both. java, single only but some effects with interfaces. c hash, single only but some effects with interfaces. ruby, single only but some effects with modules. Row 4. design issue of language, allocation and deallocation of objects. small talk, all objects are heap allocated, allocation is explicit and deallocation is implicit. C + +, objects can be static, stack dynamic, or heap dynamic, allocation and deallocation are explicit. java, all objects are heap dynamic, allocation is explicit and deallocation is implicit. c hash, all objects are heap dynamic allocation is explicit and deallocation is implicit. ruby, all objects are heap dynamic, allocation is explicit and deallocation is implicit. Row 5. design issue of language, dynamic and static binding. small talk, all method bindings are dynamic. C + +, method binding can be either. java, method binding can be either. c hash, method binding can be either. ruby, All method bindings are dynamic. Row 6. design issue of language, Nested classes. small talk, no. C + +, yes. java, yes. c hash, yes. ruby, yes. Row 7. design issue of language, Initialization. small talk, constructors must be explicitly called. C + +, constructors are explicitly called. java, constructors are explicitly called. c hash, constructors are explicitly called. ruby, constructors are explicitly called.
The table has 8 rows and 2 columns. The columns have the following headings from left to right, Lexemes and Tokens. Row 1, Lexemes, index and Tokens, identifier. Row 2, Lexemes, = and Tokens, equal underscore sign. Row 3, Lexemes, 2 and Tokens, i n t underscore literal. Row 4, Lexemes, asterisk and Tokens, m u l t underscore o p. Row 5, Lexemes, count and Tokens, identifier. Row 6, Lexemes, plus sign and Tokens, plus underscore o p. Row 7, Lexemes, 17 and Tokens, i n t underscore literal. Row 8, Lexemes, semicolon and Tokens, semicolon.
The table has 8 rows and 2 columns. The columns have the following headings from left to right, Lexemes and Tokens. Row 1, Lexemes, index and Tokens, identifier. Row 2, Lexemes, = and Tokens, equal underscore sign. Row 3, Lexemes, 2 and Tokens, i n t underscore literal. Row 4, Lexemes, asterisk and Tokens, m u l t underscore o p. Row 5, Lexemes, count and Tokens, identifier. Row 6, Lexemes, plus sign and Tokens, plus underscore o p. Row 7, Lexemes, 17 and Tokens, i n t underscore literal. Row 8, Lexemes, semicolon and Tokens, semicolon.
The C statement is. Line 1, for left parenthesis expression 1 semicolon expression 2 semicolon expression 3 right parenthesis left brace. Line 2, ellipses. Line 3, right brace. Its meaning is. Line 1, expression 1 semicolon. Line 2, loop colon if expression 2 = = 0 go to out. Line 3, ellipses. Line 4, Expression 3 semicolon. Line 5, go to loop. Line 6, out colon ellipses.
The C statement is. Line 1, for left parenthesis expression 1 semicolon expression 2 semicolon expression 3 right parenthesis left brace. Line 2, ellipses. Line 3, right brace. Its meaning is. Line 1, expression 1 semicolon. Line 2, loop colon if expression 2 = = 0 go to out. Line 3, ellipses. Line 4, Expression 3 semicolon. Line 5, go to loop. Line 6, out colon ellipses.
The table has 8 rows and 2 columns. The columns have the following headings from left to right, Token and Lexeme. Row 1, Token, I D E N T and Lexeme, result. Row 2, Token, ASSIGN underscore O P and Lexeme, equal sign. Row 3, Token, I D E N T and Lexeme, old sum. Row 4, Token, SUB underscore O P and Lexeme, minus sign. Row 5, Token, I D E N T and Lexeme, value. Row 6, Token, DIV underscore O P and Lexeme, slash. Row 7, Token, I N T underscore L I T and Lexeme, 100. Row 8, Token, SEMICOLON and Lexeme, semicolon.
The table has 8 rows and 2 columns. The columns have the following headings from left to right, Token and Lexeme. Row 1, Token, I D E N T and Lexeme, result. Row 2, Token, ASSIGN underscore O P and Lexeme, equal sign. Row 3, Token, I D E N T and Lexeme, old sum. Row 4, Token, SUB underscore O P and Lexeme, minus sign. Row 5, Token, I D E N T and Lexeme, value. Row 6, Token, DIV underscore O P and Lexeme, slash. Row 7, Token, I N T underscore L I T and Lexeme, 100. Row 8, Token, SEMICOLON and Lexeme, semicolon.
The table has 14 rows and 3 columns. The columns have the following headings from left to right, Stack, Input, and Action. Row 1, Stack, 0, Input, i d + i d asterisk i d dollar sign, and Action Shift 5. Row 2, Stack, 0 i d 5, Input, + i d asterisk i d dollar sign, and Action Shift Reduce 6, use GO TO left bracket 0 comma F right bracket. Row 3, Stack, 0 F 3, Input, + i d asterisk i d dollar sign, and Action Shift Reduce 4, use GO TO left bracket 0 comma T right bracket. Row 4, Stack, 0 T 2, Input, + i d asterisk i d dollar sign, and Action Shift Reduce 2, use GO TO left bracket 0 comma E right bracket. Row 5, Stack, 0 E 1, Input, + i d asterisk i d dollar sign, and Action Shift 6. Row 6, Stack, 0 E 1 + 6, Input, i d asterisk i d dollar sign, and Action Shift 5. Row 7, Stack, 0 E 1 + 6 I d 5, Input, i d asterisk i d dollar sign, and Action Reduce 6, use GO TO left bracket 6 comma F right bracket. Row 8, Stack, 0 E 1 + 6 F 3, Input, asterisk i d dollar sign, and Action Shift Reduce 4, use GO TO left bracket 6 comma T right bracket. Row 9, Stack, 0 E 1 + 6 T 9, Input, asterisk i d dollar sign, and Action Shift 7. Row 10, Stack, 0 E 1 + 6 T 9 asterisk 7, Input, i d dollar sign, and Action Shift 5. Row 11, Stack, 0 E 1 + 6 T 9 asterisk 7 I d 5, Input, dollar sign, and Action Reduce 6, use GO TO left bracket 7 comma F right bracket. Row 12, Stack, 0 E 1 + 6 T 9 asterisk 7 F 10, Input, dollar sign, and Action Reduce 3, use GO TO left bracket 6 comma T right bracket. Row 13, Stack, 0 E 1 + 6 T 9, Input, dollar sign, and Action Reduce 1, use GO TO left bracket 0 comma E right bracket. Row 14, Stack, 0 E 1, Input, dollar sign, and Action Accept.
The table has 14 rows and 3 columns. The columns have the following headings from left to right, Stack, Input, and Action. Row 1, Stack, 0, Input, i d + i d asterisk i d dollar sign, and Action Shift 5. Row 2, Stack, 0 i d 5, Input, + i d asterisk i d dollar sign, and Action Shift Reduce 6, use GO TO left bracket 0 comma F right bracket. Row 3, Stack, 0 F 3, Input, + i d asterisk i d dollar sign, and Action Shift Reduce 4, use GO TO left bracket 0 comma T right bracket. Row 4, Stack, 0 T 2, Input, + i d asterisk i d dollar sign, and Action Shift Reduce 2, use GO TO left bracket 0 comma E right bracket. Row 5, Stack, 0 E 1, Input, + i d asterisk i d dollar sign, and Action Shift 6. Row 6, Stack, 0 E 1 + 6, Input, i d asterisk i d dollar sign, and Action Shift 5. Row 7, Stack, 0 E 1 + 6 I d 5, Input, i d asterisk i d dollar sign, and Action Reduce 6, use GO TO left bracket 6 comma F right bracket. Row 8, Stack, 0 E 1 + 6 F 3, Input, asterisk i d dollar sign, and Action Shift Reduce 4, use GO TO left bracket 6 comma T right bracket. Row 9, Stack, 0 E 1 + 6 T 9, Input, asterisk i d dollar sign, and Action Shift 7. Row 10, Stack, 0 E 1 + 6 T 9 asterisk 7, Input, i d dollar sign, and Action Shift 5. Row 11, Stack, 0 E 1 + 6 T 9 asterisk 7 I d 5, Input, dollar sign, and Action Reduce 6, use GO TO left bracket 7 comma F right bracket. Row 12, Stack, 0 E 1 + 6 T 9 asterisk 7 F 10, Input, dollar sign, and Action Reduce 3, use GO TO left bracket 6 comma T right bracket. Row 13, Stack, 0 E 1 + 6 T 9, Input, dollar sign, and Action Reduce 1, use GO TO left bracket 0 comma E right bracket. Row 14, Stack, 0 E 1, Input, dollar sign, and Action Accept.
The table has 3 rows and 2 columns. The columns have the following headings from left to right, Point and Referencing Environment. Row 1, Point, 1 and Referencing Environment, local a and b, of sub1, and global g for reference, but not for assignment. Row 2, Point, 2 and Referencing Environment, local c, of sub2, global g for both reference and for assignment. Row 3, Point, 3 and Referencing Environment, nonlocal c, of sub2, local g, of sub3.
The table has 3 rows and 2 columns. The columns have the following headings from left to right, Point and Referencing Environment. Row 1, Point, 1 and Referencing Environment, local a and b, of sub1, and global g for reference, but not for assignment. Row 2, Point, 2 and Referencing Environment, local c, of sub2, global g for both reference and for assignment. Row 3, Point, 3 and Referencing Environment, nonlocal c, of sub2, local g, of sub3.
The table has 3 rows and 2 columns. The columns have the following headings from left to right, Point and Referencing Environment. Row 1, Point, 1 and Referencing Environment, a and b of sub 1, c of sub 2, d of main, where c of main and b of sub2 are hidden. Row 2, Point, 2 and Referencing Environment, b and c of sub 2, d of main, where c of main is hidden. Row 3, Point, 3 and Referencing Environment, c and d of main.
The table has 3 rows and 2 columns. The columns have the following headings from left to right, Point and Referencing Environment. Row 1, Point, 1 and Referencing Environment, a and b of sub 1, c of sub 2, d of main, where c of main and b of sub2 are hidden. Row 2, Point, 2 and Referencing Environment, b and c of sub 2, d of main, where c of main is hidden. Row 3, Point, 3 and Referencing Environment, c and d of main.
The list has 5 rows and 2 columns. From left to right, the columns represent the operator and its action. Row 1, operator, phi V and action, reverses the elements of V. Row 2, operator, phi M and action, reverses the columns of M. Row 3, operator, theta M and action, reverses the rows of M. Row 4, operator, O with a diagonal line through it M and action, transposes M so that its rows become its columns and vice versa. Row 5, operator, division symbol M and action, inverts M.
The list has 5 rows and 2 columns. From left to right, the columns represent the operator and its action. Row 1, operator, phi V and action, reverses the elements of V. Row 2, operator, phi M and action, reverses the columns of M. Row 3, operator, theta M and action, reverses the rows of M. Row 4, operator, O with a diagonal line through it M and action, transposes M so that its rows become its columns and vice versa. Row 5, operator, division symbol M and action, inverts M.
Ruby. Level 1, Asterisk asterisk. Level 2, Unary + and minus. Level 3, asterisk, slash, and percent sign. Level 4 binary + and minus. C based languages. Level 1, postfix + + and minus minus. Level 2, postfix + + and minus minus and unary plus and minus. Level 3, asterisk, slash, and percent sign. Level 4. Binary + and minus.
Ruby. Level 1, Asterisk asterisk. Level 2, Unary + and minus. Level 3, asterisk, slash, and percent sign. Level 4 binary + and minus. C based languages. Level 1, postfix + + and minus minus. Level 2, postfix + + and minus minus and unary plus and minus. Level 3, asterisk, slash, and percent sign. Level 4. Binary + and minus.
Ruby, Left, asterisk, slash, +, and minus. Right, asterisk asterisk. C based languages, Left, asterisk, slash, percent sign, binary +, and binary minus. Right, + +, minus minus, unary minus, and unary +.
Ruby, Left, asterisk, slash, +, and minus. Right, asterisk asterisk. C based languages, Left, asterisk, slash, percent sign, binary +, and binary minus. Right, + +, minus minus, unary minus, and unary +.
Ruby. Row 1, postfix + + and minus minus. Row 2, Unary + and minus, prefix + + and minus minus, and exclamation mark. Row 3, asterisk, slash, and percent sign. Row 4, binary + and minus. Row 5, less than, greater than, less that or equal, and greater than or equal. Row 5, equal and exclamation point equal. Row 6, ampersand ampersand. Row 7, pipe pipe, also called a vertical line.
Ruby. Row 1, postfix + + and minus minus. Row 2, Unary + and minus, prefix + + and minus minus, and exclamation mark. Row 3, asterisk, slash, and percent sign. Row 4, binary + and minus. Row 5, less than, greater than, less that or equal, and greater than or equal. Row 5, equal and exclamation point equal. Row 6, ampersand ampersand. Row 7, pipe pipe, also called a vertical line.
Level 1, Asterisk, slash, and not. Level 2, +, minus, ampersand, and mod. Level 3, unary minus. Level 4, equal, slash equal, less than, less than or equal, greater than or equal, greater than. Level 5, and. Level 6, or x or. Associativity is from left to right.
Level 1, Asterisk, slash, and not. Level 2, +, minus, ampersand, and mod. Level 3, unary minus. Level 4, equal, slash equal, less than, less than or equal, greater than or equal, greater than. Level 5, and. Level 6, or x or. Associativity is from left to right.
The list has 6 rows and 2 columns. From left to right, the columns represent, the operation and a description. Row 1, operation, create left parenthesis stack right parenthesis and description, Creates and possibly initializes a stack object. Row 2, operation, destroy left parenthesis stack right parenthesis and description, Deallocates the storage for the stack. Row 3, operation, empty left parenthesis stack right parenthesis and description predicate, or Boolean function that returns true if the specified stack is empty and false otherwise. Row 4, operation, push left parenthesis stack comma element right parenthesis and description, Pushes the specified element on the specified stack. Row 5, operation, pop left parenthesis stack right parenthesis and description, Removes the top element from the specified stack. Row 6, operation, top left parenthesis stack right parenthesis and description, Returns a copy of the top element from the specified stack.
The list has 6 rows and 2 columns. From left to right, the columns represent, the operation and a description. Row 1, operation, create left parenthesis stack right parenthesis and description, Creates and possibly initializes a stack object. Row 2, operation, destroy left parenthesis stack right parenthesis and description, Deallocates the storage for the stack. Row 3, operation, empty left parenthesis stack right parenthesis and description predicate, or Boolean function that returns true if the specified stack is empty and false otherwise. Row 4, operation, push left parenthesis stack comma element right parenthesis and description, Pushes the specified element on the specified stack. Row 5, operation, pop left parenthesis stack right parenthesis and description, Removes the top element from the specified stack. Row 6, operation, top left parenthesis stack right parenthesis and description, Returns a copy of the top element from the specified stack.
The list has 6 rows and 2 columns. From left to right, the columns represent, the expression and its value. Row 1, expression, 42 and value, 42. Row 2, expression, left parenthesis asterisk 3 7 right parenthesis and value, 21. Row 3, expression, left parenthesis + 5 7 8 right parenthesis and value, 20. Row 4, expression, left parenthesis minus 5 6 right parenthesis and value, negative 1. Row 5, expression, left parenthesis minus 15 7 2 right parenthesis and value, 6. Row 6, expression, left parenthesis minus 15 7 2 right parenthesis and value, 6.
The list has 6 rows and 2 columns. From left to right, the columns represent, the expression and its value. Row 1, expression, 42 and value, 42. Row 2, expression, left parenthesis asterisk 3 7 right parenthesis and value, 21. Row 3, expression, left parenthesis + 5 7 8 right parenthesis and value, 20. Row 4, expression, left parenthesis minus 5 6 right parenthesis and value, negative 1. Row 5, expression, left parenthesis minus 15 7 2 right parenthesis and value, 6. Row 6, expression, left parenthesis minus 15 7 2 right parenthesis and value, 6.
The list has 9 rows and 2 columns. From left to right, the columns represent, the Function and its Meaning. Row 1, function, equal sign and meaning, equal. Row 2, function, less than sign, greater than sign and meaning, not equal. Row 3, function, greater than sign and meaning, greater than. Row 4, function, less than sign and meaning, less than. Row 5, function, greater than sign, equal sign and meaning, greater than or equal. Row 6, function, less than sign, equal sign and meaning, less than or equal. Row 7, function, EVEN question mark and meaning, Is it an even number question mark. Row 8, function, ODD question mark and meaning, Is it an odd number question mark. Row 9, function, ZERO question mark and meaning, Is it zero question mark.
The list has 9 rows and 2 columns. From left to right, the columns represent, the Function and its Meaning. Row 1, function, equal sign and meaning, equal. Row 2, function, less than sign, greater than sign and meaning, not equal. Row 3, function, greater than sign and meaning, greater than. Row 4, function, less than sign and meaning, less than. Row 5, function, greater than sign, equal sign and meaning, greater than or equal. Row 6, function, less than sign, equal sign and meaning, less than or equal. Row 7, function, EVEN question mark and meaning, Is it an even number question mark. Row 8, function, ODD question mark and meaning, Is it an odd number question mark. Row 9, function, ZERO question mark and meaning, Is it zero question mark.
The list has 5 rows and 4 columns. From left to right, the columns represent the name, symbol, example, and meaning of the predicate calculus logical connectors. Row 1, name, negation, symbol, horizontal line with a short downward vertical line at its right end, followed by another short horizonal line backward to the left, example, horizontal line with a short downward vertical line at its right end, followed by another short horizonal line backward to the left a, and meaning, not a. Row 2, name, conjunction, symbol, upside down capital U, example, a upside down capital U b, and meaning, a and b. Row 3, name, disjunction, symbol, capital U, example, a capital U b, and meaning, a or b. Row 4, name, equivalence, symbol, an equal sign with three lines, example, a equal sign with three lines b and meaning, a is equivalent to b. Row 5, name, implication, symbol, smaller left opening capital U, example, a smaller left opening capital U b, and meaning, a implies b. Additional example, a smaller right opening capital U b, and meaning b implies a.
The list has 5 rows and 4 columns. From left to right, the columns represent the name, symbol, example, and meaning of the predicate calculus logical connectors. Row 1, name, negation, symbol, horizontal line with a short downward vertical line at its right end, followed by another short horizonal line backward to the left, example, horizontal line with a short downward vertical line at its right end, followed by another short horizonal line backward to the left a, and meaning, not a. Row 2, name, conjunction, symbol, upside down capital U, example, a upside down capital U b, and meaning, a and b. Row 3, name, disjunction, symbol, capital U, example, a capital U b, and meaning, a or b. Row 4, name, equivalence, symbol, an equal sign with three lines, example, a equal sign with three lines b and meaning, a is equivalent to b. Row 5, name, implication, symbol, smaller left opening capital U, example, a smaller left opening capital U b, and meaning, a implies b. Additional example, a smaller right opening capital U b, and meaning b implies a.
The list has 2 rows and 3 columns. From left to right, the columns represent the name, example, and meaning of the predicate calculus qualifiers. Row 1, name, universal, example, upside down capital A X comma P, and meaning, for all X, P is true. Row 2, name, existential, example, left opening capital E X comma P, and meaning, there exists a value of X such that P is true.
The list has 2 rows and 3 columns. From left to right, the columns represent the name, example, and meaning of the predicate calculus qualifiers. Row 1, name, universal, example, upside down capital A X comma P, and meaning, for all X, P is true. Row 2, name, existential, example, left opening capital E X comma P, and meaning, there exists a value of X such that P is true.
If, bob is the parent of jake implies that bob is either the father or mother of jake. And, bob is the father of jake and jake is the father of fred implies that bob is the grandfather of fred. Then, if bob is the parent of jake and jake is the father of fred then, either bob is jake's mother or bob is fred's grandfather.
If, bob is the parent of jake implies that bob is either the father or mother of jake. And, bob is the father of jake and jake is the father of fred implies that bob is the grandfather of fred. Then, if bob is the parent of jake and jake is the father of fred then, either bob is jake's mother or bob is fred's grandfather.
The list has 9 rows and 2 columns. From left to right, the columns represent the processor and the Web site. Row 1, Processor, C, C + +, Fortran, and Ada. Web site, g c c dot g n u dot org. Row 2, Processor, C sharp and F sharp. Web site Microsoft dot com. Row 3, Processor, Java. Web site, java dot sun dot com. Row 4, Processor, Haskel. Web site, haskel dot org. Row 5, Processor, Lua. Web site, none listed. Row 6, Processor, Scheme. Web site, w w w dot p l t dash schemer dot org slash software slash d r scheme. Row 7, Processor, Perl. Web site, w w w dot perl dot com. Row 8, Processor, Python. Web site, w w w dot python dot org. Row 9, Processor, Ruby. Web site, w w w dot ruby dash l a n g dot org. Note, JavaScript is included in virtually all browsers; P H P is included in virtually all Web servers. All this information is also included on the companion Web site.
The list has 9 rows and 2 columns. From left to right, the columns represent the processor and the Web site. Row 1, Processor, C, C + +, Fortran, and Ada. Web site, g c c dot g n u dot org. Row 2, Processor, C sharp and F sharp. Web site Microsoft dot com. Row 3, Processor, Java. Web site, java dot sun dot com. Row 4, Processor, Haskel. Web site, haskel dot org. Row 5, Processor, Lua. Web site, none listed. Row 6, Processor, Scheme. Web site, w w w dot p l t dash schemer dot org slash software slash d r scheme. Row 7, Processor, Perl. Web site, w w w dot perl dot com. Row 8, Processor, Python. Web site, w w w dot python dot org. Row 9, Processor, Ruby. Web site, w w w dot ruby dash l a n g dot org. Note, JavaScript is included in virtually all browsers; P H P is included in virtually all Web servers. All this information is also included on the companion Web site.
The list has 45 rows and 2 columns. From left to right, the columns represent, the reviewer and institution. Row 1, Reviewer, Aaron Rababaah. Institution, University of Maryland at Eastern Shore. Row 2, Reviewer, Amar Raheja. Institution, California State Polytechnic University Pomona. Row 3, Reviewer, Amer Diwan. Institution, University of Colorado. Row 4, Reviewer, Bob Neufeldh. Institution, Wichita State University. Row 5, Reviewer, Bruce R. Maxim. Institution, University of Michigan Dearborn. Row 6, Reviewer, Charles Nicholash. Institution, University of Maryland Baltimore County. Row 7, Reviewer, Cristian Videira Lopes. Institution, University of California Irvine. Row 8, Reviewer, Curtis Meadow. Institution, University of Maine. Row 9, Reviewer, David E. Goldschmidt. Institution, none listed. Row 10, Reviewer, Donald Kraft. Institution, Louisiana State University. Row 11, Reviewer, Duane J. Jarc. Institution, University of Maryland, University College. Row 12, Reviewer, Euripides Montagne. Institution, University of Central Florida. Row 13, Reviewer, Frank J. Mitropoulos. Institution, Nova Southeastern University. Row 14, Reviewer, Gloria Melara. Institution, California State University Northridge. Row 15, Reviewer, Hossein Saiedian. Institution, University of Kansas. Row 16, Reviewer, I ping Chu. Institution, DePaul University. Row 17, Reviewer, Ian Barland. Institution, Radford University. Row 18, Reviewer, K N King. Institution, Georgia State University. Row 19, Reviewer, Karina Assiter. Institution, Wentworth Institute of Technology. Row 20, Reviewer, Mark Llewellyn. Institution, University of Central Florida. Row 21, Reviewer, Matthew Michael Burke. Institution, none listed. Row 22, Reviewer, Michael Prentice. Institution, SUNY Buffalo. Row 23, Reviewer, Nancy Tinkham. Institution, Rowan University. Row 24, Reviewer, Neelam Soundarajan. Institution, Ohio State University. Row 25, Reviewer, Nigel Gwee. Institution, Southern University Baton Rouge. Row 26, Reviewer, Pamela Cutter. Institution, Kalamazoo College. Row 27, Reviewer, Paul M. Jackowitz. Institution, University of Scranton. Row 28, Reviewer, Paul Tymann. Institution, Rochester Institute of Technology. Row 29, Reviewer, Richard M. Osborne. Institution, University of Colorado Denver. Row 30, Reviewer, Richard M. Institution, University of Texas at Dallas. Row 31, Reviewer, Robert McCloskey. Institution, University of Scranton. Row 32, Reviewer, Ryan Stansifer. Institution, Florida Institute of Technology. Row 33, Reviewer, Salih Yurttas. Institution, Texas A and M University. Row 34, Reviewer, Saverio Perugini, University of Dayton. Row 35, Reviewer, Serita Nelesen. Institution, Calvin College. Row 36, Reviewer, Simon H Lin. Institution, California State University Northridge. Row 37, Reviewer, Stephen Edwards. Institution, Virginia Tech. Row 38, Reviewer, Stuart C. Shapiro. Institution, SUNY Buffalo. Row 39, Reviewer, Sumanth Yenduri. Institution, University of Southern Mississippi. Row 40, Reviewer, Teresa Cole. Institution, Boise State University. Row 41, Reviewer, Thomas Turner. Institution, University of Central Oklahoma. Row 42, Reviewer Tim R Norton. Institution, University of Colorado Colorado Springs. Row 43, Reviewer, Timothy Henry. Institution, University of Rhode Island. Row 44, Reviewer, Walter Pharr. Institution, College of Charleston. Row 45, Reviewer, Xiangyan Zeng. Institution Fort Valley State University.
The list has 45 rows and 2 columns. From left to right, the columns represent, the reviewer and institution. Row 1, Reviewer, Aaron Rababaah. Institution, University of Maryland at Eastern Shore. Row 2, Reviewer, Amar Raheja. Institution, California State Polytechnic University Pomona. Row 3, Reviewer, Amer Diwan. Institution, University of Colorado. Row 4, Reviewer, Bob Neufeldh. Institution, Wichita State University. Row 5, Reviewer, Bruce R. Maxim. Institution, University of Michigan Dearborn. Row 6, Reviewer, Charles Nicholash. Institution, University of Maryland Baltimore County. Row 7, Reviewer, Cristian Videira Lopes. Institution, University of California Irvine. Row 8, Reviewer, Curtis Meadow. Institution, University of Maine. Row 9, Reviewer, David E. Goldschmidt. Institution, none listed. Row 10, Reviewer, Donald Kraft. Institution, Louisiana State University. Row 11, Reviewer, Duane J. Jarc. Institution, University of Maryland, University College. Row 12, Reviewer, Euripides Montagne. Institution, University of Central Florida. Row 13, Reviewer, Frank J. Mitropoulos. Institution, Nova Southeastern University. Row 14, Reviewer, Gloria Melara. Institution, California State University Northridge. Row 15, Reviewer, Hossein Saiedian. Institution, University of Kansas. Row 16, Reviewer, I ping Chu. Institution, DePaul University. Row 17, Reviewer, Ian Barland. Institution, Radford University. Row 18, Reviewer, K N King. Institution, Georgia State University. Row 19, Reviewer, Karina Assiter. Institution, Wentworth Institute of Technology. Row 20, Reviewer, Mark Llewellyn. Institution, University of Central Florida. Row 21, Reviewer, Matthew Michael Burke. Institution, none listed. Row 22, Reviewer, Michael Prentice. Institution, SUNY Buffalo. Row 23, Reviewer, Nancy Tinkham. Institution, Rowan University. Row 24, Reviewer, Neelam Soundarajan. Institution, Ohio State University. Row 25, Reviewer, Nigel Gwee. Institution, Southern University Baton Rouge. Row 26, Reviewer, Pamela Cutter. Institution, Kalamazoo College. Row 27, Reviewer, Paul M. Jackowitz. Institution, University of Scranton. Row 28, Reviewer, Paul Tymann. Institution, Rochester Institute of Technology. Row 29, Reviewer, Richard M. Osborne. Institution, University of Colorado Denver. Row 30, Reviewer, Richard M. Institution, University of Texas at Dallas. Row 31, Reviewer, Robert McCloskey. Institution, University of Scranton. Row 32, Reviewer, Ryan Stansifer. Institution, Florida Institute of Technology. Row 33, Reviewer, Salih Yurttas. Institution, Texas A and M University. Row 34, Reviewer, Saverio Perugini, University of Dayton. Row 35, Reviewer, Serita Nelesen. Institution, Calvin College. Row 36, Reviewer, Simon H Lin. Institution, California State University Northridge. Row 37, Reviewer, Stephen Edwards. Institution, Virginia Tech. Row 38, Reviewer, Stuart C. Shapiro. Institution, SUNY Buffalo. Row 39, Reviewer, Sumanth Yenduri. Institution, University of Southern Mississippi. Row 40, Reviewer, Teresa Cole. Institution, Boise State University. Row 41, Reviewer, Thomas Turner. Institution, University of Central Oklahoma. Row 42, Reviewer Tim R Norton. Institution, University of Colorado Colorado Springs. Row 43, Reviewer, Timothy Henry. Institution, University of Rhode Island. Row 44, Reviewer, Walter Pharr. Institution, College of Charleston. Row 45, Reviewer, Xiangyan Zeng. Institution Fort Valley State University.